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The  objective  of  this  study  was  to  conduct  applied  research  directed 
toward  understanding  the  relationship  between  the  complexity  or  ef£iclenc> 
of  algorltbms  and  the  overall  quality  of  computer  software.  The  final 
report  is  presented  in  a  two  volume  series  consisting  of  a  total  of  eight 
parts.  This  volume,  containing  Parts  3  through  8  describes  the  results  ol 
several  technical  investigations  which  were  conducted. 

Part  3  is  a  tutorial  on  computational  algebra,  illustrating  the  nature  o 
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nomlals  and  matrices. 

Part  4  develops  a  systematic  approach  to  the  analysis  of  algorithms..  The 
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illty  distribution  function  of  the  program  variables.  This  technlqM  Is 
aiVplled  to  several  simple  algorithms,  sorting  and  searching  algorithm, 
and  a  tree  Insertlon/deletlon  algorithm.  _ 

Part  5  Is  an  experimental  analysis  of  a  fast,  new  sorting  method  called 
DPS  Cdlstributlve  partitioning  sorting) .  It  dwelops  a  framework  for 
conducting  such  experiments,  and  proposed  several  Improvements  to  DPS  for 
dealing  with  data  from  unknown  or  skewe^dlstrlbutlon. 

sPart  6  applies  order  statistics  to  Investigate  the  expected  quality  of 
several  approximation  algorithms  for  the  Euclidean  traveling  salesman 
problem,  known  to  be  NP~complete.j 

ipart  7  presents  a  survey  of  data  base  access  methods  for  both  univariate 
and  multivariate  range  queries..  The  techniques  discussed  Include  B-trees 
anA  extensible  hashing  for  the  univariate  case,  and  radix  bit  mapping  and 
K-D-B-trees  for  the  multlvariWe  case. 
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PREFACE 


This  is  the  second  of  two  volumes  constituting  the  final 
technical  report  for  a  study  entitled  "Algorithmic  Complexity". 
'  The  work  was  performed  in  support  of  the  Information  Sciences 
Division,  Rome  Air  Development  Center,  under  U.S.  Air  Force 
Systems  Command  contract  F30602-79-C-0124.  The  duration  of  the 
project  was  from  June  1979  through  August  1981. 

The  research  described  herein  was  performed  by  members  of 
the  Department  of  Computer  Science  and  Experimental  Statistics 
at  the  University  of  Rhode  Island.  Dr.  Edmund  A.  Lamagna  served 
as  Principal  Investigator  for  this  effort.  Dr.  Leonard  J.  Bass 
was  Co-Principal  Investigator.  Three  graduate  assistants  — 
Kessrs.  Lyle  A.  Anderson,  Ralph  E.  Bunker,  and  Philip  J.  Janus 
—  also  worked  on  the  project.  Technical  guidance  was  provided 
by  Mr.  Joseph  P.  Cavano,  RADC  Project  Engineer. 

The  study  consists  of  eight  parts,  whose  titles  are; 

1.  Measures  of  Algorithmic  Efficiency:  An  Overview  (Lamagna) 

2.  The  Performance  of  Algorithms:  A  Research  Plan  (Lamagna, 
Bass,  and  Anderson) 

3.  Fast  Computer  Algebra  (Lamagna) 

4.  Systematic  Analysis  of  Algorithms  (Anderson) 

5.  Adaptive  Methods  for  Unknown  Distributions  in  Distributive 
Partitioning  Sorting  (Janus) 

6.  Expected  Behavior  of  Approximation  Algorithms  for  the 
Euclidean  Traveling  Salesman  Problem  (Lamagna  with  E.  J. 
Carney  and  P.  V.  Kamat) 

7.  Data  Base  Access  Methods  (Bass) 

8.  An  Experimental  Evaluation,  of  the  Frame  Memory  Model  of  a 
Data  Base  Structure  (Bunker  and  Bass) 

Volume  I  contains  Parts  1  and  2,  coitiprislng  a  general 
Introduction  to  the  entire  series  and  a  research  plan.  Volume 
II  contains  the  remaining  six  parts,  describing  the  results  of 
several  technical  investigations  which  were  conducted. 
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ALGORITHMIC  COMPLEXITY 
Part  3 

rv 

Edaund  A.  Trfuniigna 

FAST  COMPUTER  ALGEBRA 

Abatraet 

Hew  algorlttaas  for  solving  familiar  algebraic  problems  on  costputers  have 
recently  bden  devised.  These  methods  are  more  efficient  than  classical 
ones  for  large  problem  sizes t  and  sosw  can  be  shown  to  be  optimal.  This 
tutorial  illustrates  these  ideas  by  examining  the  problesis  of  raising  a 
number  to  a  power,  evaluating  a  polynomial  at  one  or  several  points, 
and  multiplying  polynomials  and  matrices. 


I 

i 


This  work  was  siqpported  by  Air  Force  STStems  Ccsamuad,  Rome  Air  DevelopmnBt 
Center,  under  Contract  lo.  F30602-79-C-012(i. 
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FAST  COMPUTER  AL6BRA 


The  astounding  speed  of  modern  digital  computers  has  made  it  possible  to 
perform  ccmputations  of  a  size  that  vould  be  completely  infeasible  without 
their  use.  For  example ^  the  fastest  computers  of  today  can  solve  a  system 
of  one  hundred  simultaneous  linear  equations  in  a  hundred  unknowns  in  a 
matter  of  seconds.  If  a  person  could  perform  one  arithmetic  operation,  such 
as  an  addition  or  multiplication,  per  minute  and  worked  on  the  problem  non-stop 
using  the  classical  method  of  Gaussian  elimination,  it  would  take  almost  one 
year  to  obtain  the  same  result.  In  fact,  under  the  same  assumptions,  it 
would  take  a  person  Just  over  a  day  to  solve  a  system  of  only  a  dozen  equations. 
The  calculations  of  our  tireless  human  computer  are,  of  course,  far  mere 
susceptible  to  error. 

Before  the  advent  of  digital  computers,  the  sizes  of  most  algebraic  and 
numeric  problems  which  could  be  solved  was  severely  limited  although  it  was 
known  how  to  solve  large  problems  in  principle.  Because  the  sizes  of  the 
problems  tackled  by  hand  were  small,  applying  a  simple  formula  usually  sufficed 
to  produce  the  desired  results.  Little  attention  was  generally  paid  to  finding 
computationally  efficient,  but  perhaps  longer,  formulas  or  methods  of  calculation 

Curing  the  past  decade,  a  new  branch  of  mathematical  computer  science 
known  as  analysis  of  algorithms  and  computational  complexity  has  blossomed. 

The  goal  of  this  field  is  to  compare  the  relative  efficiency  of  alternative 
techniques  for  solving  a  problem  and,  whenever  possible,  to  prove  that  some 
method  is  the  best  one  could  hope  to  find.  As  a  res\ilt  of  this  work,  a 
number  of  surprising  new  algorithms,  or  computational  procedures,  have  been 
developed.  These  techniques  sometimes  seem  counterintuitive  at  first  and  often 
do  not  outperform  the  classical  methods  for  small  problem  sizes.  However^  as 
the  problem  size  Increases,  the  imprc'''enient  in  execution  speed  on  a  computer 
can  be  quite  drtsmatlc.  3-1 
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As  a  simple  exan^le  of  such  a  result*  ve  consider  the  problem  of  multiplying 
tvo  complex  numbers.  A  single  complex  number  is  generally  denoted  by  a  pair  of 
values  representing  its  real  and  Imaginary  parts.  The  product  of  two  such 
numbers  a-¥bi  and  is  given  by  the  formula  (ao-bd)+{ad-i1}o)it  where  i  is 

Suppose  we  are  given  a  and  2)*  representing  the  real  and  imaginary  part  of 
the  first  coiQ>lex  nuaiber,  and  <?  and  dt  representing  the  corresponding  parts  of 
the  second  number,  and  are  asked  to  calculate  the  real  and  imaginary  parts  of 
their  product.  Conqmters  represent  and  operate  on  complex  numbers  in  just 
this  manner .  Applying  the  formula  for  complex  product,  four  multiplications 
(viz.,  oc,  bdj  adt  and  ba)^  one  subtraction  (viz.,  ao-bd)t  and  one  addition 
(viz.,  ad-tiia)  are  used. 

An  alternative  method  for  computing  the  result  is  as  follows.  First  add 

a  to  b  and  a  to  <i,  multiplying  these  two  sums:  (a+i).  (<yhi)^e+ad+2x?+2xi^^ . 

Next  form  the  two  products  and  m^-b-d.  The  real  part  of  the  conplex 

product  can  be  formed  as  m  ’-m  and  the  imaginary  part  as  m  -m  -m  .  This  method 

IS  1.23 

uses  three  multiplications,  two  additions,  and  three  subtractions.  Although 
this  new  method  performs  eight  operations  to  the  classical  method's  six,  it 
does  use  one  fewer  multiplication.  But  a  multiplication  operation  executes 
far  more  slowly  than  either  an  addition  or  subtraction  on  a  computer.  (Additions 
and  subtractions  execute  at  comparable  speeds . )  If  a  multiplication  takes  a 
not  uncommon  factor  of  ten  times  longer,  the  new  procedure  for  complex  product 
will  run  about  20%  faster  than  the  classical  one.  Due  to  the  speed  of  digital 
computers,  this  Isqprovement  will  go  unnoticed  if  only  a  few  complex  inroducts 
are  to  be  taken,  however  it  can  become  increasingly  important  as  the  amount  of 
work  to  be  done  grows. 
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In  order  to  compare  the  efficiencies  of  alternative  algorithms  for 
some  CLlgehralc  problem,  ve  will  have  to  make  more  precise  Just  vhat  types 
of  coinputatlons  will  be  allowed.  Researchers  who  study  the  complexity  of 
e^gebralc  problems  use  a  model  of  computation  called  a  "straight-line 
algorithm".  Within  this  framework  we  are  given  a  set  of  input  data  plus 
any  constants  we  choose  to  work  with.  Algorithms  consist  of  a  sequence 
of  steps  in  which  arithmetic  operations  are  applied  to  the  input  data, 
constants,  or  the  results  obtained  in  previous  computation  steps.  Figure 
2  shows  two  straight-line  algorithms  which  evaluate  the  polynomial 
p(x)^*+lMc*+5af+2  given  the  value  of  the  variable  x  as  data. 

To  assess  the  efficiency  of  an  algorithm,  we  will  count  either  the 
total  n\imber  of  arithmetic  operations  performed  or  the  number  of  some 
specific  type  (e.g.,  multiplications).  This  model  of  computation  ignores 
many  practical  considerations  which  will  affect  the  running  time  of  an 
algorithm  if  it  is  actxially  programmed  to  be  executed  on  a  computer.  In  a 
programed  realization,  the  operations  to  be  performed  sure  systematically 
specified  using  loops  and  tests  so  that  the  program  will  work  for  all 
input  sizes.  The  straight-line  algorithm  paradigm  neglects  the  cost  of 
the  overhead  associated  with  loop  control  cmd  testing  operations,  as  well 
as  the  time  required  to  fetch  and  store  information  inside  a  computer's 
memory.  These  costs  can  vary  greatly  from  computer  to  cominiter,  and  will 
not  even  be  the  same  for  two  programming  lemguage  compilers  implemented  on 
the  same  machine.  Fortunately  the  overall  times  of  the  algorithms  studied 
are  driven  primarily  by  the  underlying  structure  of  the  arithmetic 
operations  performed,  rather  than  such  overhead  considerations,  so  the 
results  obtained  are  generally  acc\irate  to  within  a  small  constut  factor 
for  actxial  implesentations . 


Evaluation  of  Powers 


Suppose  ve  are  given  a  real  number  x  euid  a  positive  Integer  n  and 

are  asked  to  find  the  Value  of  x”.  The  obvious  way  to  solve  this  problem 

Is  to  start  with  x  and  multiply  x  a  total  of  n-1  times.  For  example, 

to  find  X**  we  confute  each  of  the  partial  res\J.ts  x*,x*,x^, . .  .,x®*,x** 

6Uid  eurrive  at  the  desired  answer  after  31  multiplication  steps.  We  will 

call  this  algorithm  the  "brute  force"  method  of  computing  x”. 

A  more  efficient  way  to  arrive  at  x**  is  by  repeated  squaring  of 

each  partial  result.  For  example,  at  the  first  step  we  square  x  to  obtain 

X*.  At  the  second  step  x*  is  sqtiared  to  yield  x**,  and  so  on.  Using  this 

technique  we  arrive  at  x®^  in  5  multiplications  via  the  following  sequence 

of  partial  results:  x*,x'*,x*,x^ ®,x**. 

The  method  of  repeated  squaring  can  be  used  to  compute  x”  with  log  n 

2 

multiplications  when  n  is  a  power  of  two.  (If  y-log  x,  the  ba8e-2  logarithm 

2 

of  X,  then  2^*x.)  This  achieves  an  exponential  improvement  over  the  brute 

force  method.  The  difficulty  is  that  the  algorithm  can  only  be  used 

directly  when  n  is  a  power  of  two. 

An  algorithm  called  the  "binary  method"  generalizes  the  principle 

of  repeated  squaring  to  work  for  all  values  of  n.  To  apply  this  technique, 

we  begin  by  writing  down  the  binary  representation  of  the  number  n  with 

any  leading  zeros  deleted.  For  example,  if  n=21  we  write  21= ( 10101 )  . 

2 

Ignoring  the  first  bit  in  the  binary  representation  (which  must  be  l), 
we  next  replace  each  remaining  1  by  the  letters  SX  and  each  0  by  the  letter 
S.  When  n»21  we  obtain  the  sequence  of  letters  SSXSSX.  This  sequence 
yields  a  mile  for  evaluating  x”  if  we  interpret  each  S  to  mean  "square 
the  result  of  the  previous  step"  .and  each  X  to  mean  "multiply  the  result 
of  the  previous  step  by  x". 


In  our  exanple,  we  begin  by  squaring  x  to  obtain  x*  since  the  first 

letter  in  the  sequence  is  S.  We  next  square  this  partial  result  to  obtain 

X*  since  the  second  letter  is  again  5.  Because  the  third  letter  is  X, 

we  aultiply  the  resxilt  of  the  second  step  by  x  to  yield  x*  at  the  third 

step,  and  so  on.  Hence  we  arrive  at  x**  by  the  following  sequence  of 

partial  results:  x*,x'*,x*,x'®,x*®,x**. 

We  now  wish  to  investigate  the  number  of  multiplication  steps  the 

binary  method  uses  to  compute  x”.  There  are  [log  nJ+1  bits  in  the  binary 

2 

representation  of  n.  (L^J  denotes  the  "floor"  of  x,  or  the  largest 

Integer  less  than  or  equal  to  x. )  Let  \)(n)  denote  the  number  of  these 

bits  which  are  1.  Since  one  S  occurs  In  the  evaluation  sequence  for  each 

bit  in  the  binary  representation  of  n  other  than  the  first,  the  ntmber  of 

squaring  operations  used  is  [log  nj.  Furthermore,  the  number  of  X’s  in 

2 

the  sequence  is  Just  one  less  than  v(n).  Hence  [log  nj4v(n)-l  multi- 

2 

plications  are  used  overall.  Since  v(n)<Llog^nJ+l,  with  equality  holding 

when  all  the  bits  In  n's  representation  are  1,  the  number  of  steps  in  the 

binary  method  does  not  exceed  2Llog  nJ. 

2 

The  smallest  value  of  n  for  which  the  binary  method  is  not  optimal 
is  15.  The  binary  method  uses  6  multiplications  to  evaluate  x^*,  with 
the  sequence  SXSXSX  giving  rise  to  the  partial  results  x*,x*,x*,x%x**,x*  ®. 
However,  x^*  can  be  calculated  in  5  steps  by  first  finding  y’W*,  and 
raising  y  to  the  fifth  power  with  three  more  multiplications  since 
y*«(x’)*»  x^  *. 

This  method  for  cooqmting  x^’  is  based  on  the  realization  that  1^  can 
be  factored  as  3  times  5*  In  general  if  the  nimber  n  can  be  factored  as 
n*p^qt  then  x”  can  be  evaluated  by  first  coaqniting  y*a^  and  then  calcu- 
latlxig  We  now  describe  cm  algorithm  called  the  "factor 

method"  which  is  based  upon  this  principle. 


Algorithm  F:  factor  method.  (Hote  that  a  prime  number  Is  one 
having  no  Integer  factors  other  than  1  and  Itself.) 

FI.  If  n»l,  we  have  x”  with  no  calculation. 

F2.  If  n  Is  prime,  calctilate  x”  by  first  finding  x””^  using  the 
factor  method;  then  multiply  this  quantity  by  x. 

F3.  Otherwise,  write  «  as  p*^,  where  p  Is  the  smallest  prime  factor 
of  rt  and  q>l.  Calcvilate  x”  by  first  finding  via  the  factor  method; 
then  raise  this  quantity  to  the  qth  power,  again  via  the  factor  method. 

We  Illustrate  this  technique  by  showing  how  x**  Is  evaluated.  First, 

21  la  factored  as  3*7,  and  x*  Is  calculated  by  repeated  use  of  the  algo¬ 
rithm  with  2  multiplications.  Our  problem  reduces  to  calculating 
where  y=x®.  Since  7  Is  a  prime,  will  be  computed  by  first  finding 
!/*■  and  then  multiplying  by  y.  Repeated  use  of  the  algorithm  reveals 
that  y®  will  be  found  by  taking  (y^)*.  •  Letting  s=y*,  the  steps  used  to 
evaluate  x**  are:  (l)  x*xsix*,  (2)  x**x:«*=y,  (3)  x**x*=a:®ay*as, 

(U)  x®*x®^‘*=z*,  (5)  x***x®ax**=3*=(y*)®,  and  (6)  x^®*x®=«^*=y®*y. 

Although  the  performance  of  the  factor  method  Is  better  than  ths-t 
of  the  binary  method  on  the  average,  there  are  Instances  when  the  blnarj’ 
method  Is  superior.  The  smallest  such  case  Is  n=33,  where  the  factor 
method  uses  7  multiplications  euid  the  binary  method  only  6.  In  fact 
th«re  are  Infinitely  many  values  of  n  for  which  the  factor  method  Is 
better  than  the  binary  method,  and  vice  versa.  Moreover,  neither  the 
factor  method  nor  the  binary  method  need  be  optimal.  The  smallest  such 
case  Is  n=23,  where  both  the  factor  and  binary  methods  use  7  multiplications. 
However,  x**  can  be  calculated  with  6  multiplications  as  follows: 

(1)  x.x^*,  (2)  x*.x»mr*,  (3)  x*«x*=x*,  (U)  x®.x®=w*®,  (5)  x*®*x*®=x^®, 
and  (6)  x*"*x*^®®. 
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Since  neither  the  factor  method  nor  the  hinaty  method  is  always 
optimal,  it  would  seem  worthwhile  to  investigate  Just  how  good  these'  two 
methods  are.  Let  P(n)  denote  the  "i-tw-timnw  number  of  multiplications 
required  to  csilculate  x”  regardless  of  the  method  employed.  Observe 
that  the  greatest  power  of  x  which  can  be  obtained  using  k  multi¬ 
plications  is  and  this  is  obtained  by  successively  squaring  x  at 
each  step  of  the  conputatlon.  Thus  in  order  to  compute  a  power  of  x  as 
lau:>ge  as  x”,  we  must  use  at  least  k  multiplications  where  2^>n,  or 

equivalently,  ioflog  nl.  (fxl  denotes  the  "ceiling”  of  x  or  the  smallest 
“  2 

integer  greater  than  or  equal  to  x.)  Therefore  P(n)>riog  nl. 

”  2 

The  above  result  provides  a  lower  bound  on  the  number  of  multi¬ 
plications  required  to  cominite  x”.  It  states  that  at  least  flog  nl 

2 

multiplications  are  neqessary,  but  gives  no  Indication  at  all  of  whether 

this  number  is  siifflcient.  Our  earlier  analysis  of  the  binary  method 

further  reveals  that  P(n)<2Llog  nj,  and  hence  the  binary  method  is 

*•  2 

guaranteed  to  be  efficient  in  the  sense  that  it  never  uses  more  than 
twice  the  minimal  number  of  multiplications.  The  factor  method  demon¬ 
strates  that  if  n=p*<7,  then  P{n)<P{p)*P{q) , 

The  tree  in  Flg\ire  U  gives  a  minimal,  or  optimal,  multiplication 
sequence  when  n  is  100  or  less.  To  calculate  x”  we  locate  n  in  the  tree. 
The  path  from  the  root  ('bottom)  of  the  tree  to  n  Indicates  the  sequence 
of  exponents  which  occurs  in  one  optimal  evaluation  of  x”.  The  value  of 
p(n),  for  l<n<100,  is  simply  the  length  of  the  path  in  the  tree  from 
the  root  to  n.  For  example,  to  compute  x^*  we  find  that  we  should 
calculate  the  following  powers  of  x;  2,  3,  5,  10,  11,  21,  31.  Hence  an 
optimal  chain  is  given  by:  (l)  x-x*x*,  (2)  x**x^*,  (3)  x®*x^*x®. 


(7)  x**-x*<^**» 


The  idea 


(U)  x**x**«*®,  (5)  x***x^",  (6)  x"*x**ax**, 

behind  the  method  is  that  any  number  in  the  tree  c«ui  be  written  as  the  sum 
of  two  numbers,  or  twice  a  single  number  on  the  path  between  itself  and  the 
root.  The  sums  formed  to  reach  n  in  the  tree  correspond  to  the  intermediate 
powers  formed  in  calctilating  x”.  Of  course  we  knew  all  along  that  x® ‘X^ ax‘*'*’^. 

In  our  discussion  of  the  power  evaluation  problem  we  have  concentrated 
on  the  operation  of  multiplication.  Neither  addition  nor  subtraction  sure  of 
smy  help  in  e-valuatlng  powers.  But  what  about  division?  We  have  Just  seen 
that  7  multiplications  are  minimal  to  ccsnpute  x**.  However  if  division  is 
sLLlowed,  x**  can  be  found  in  6  operations  by  calculating  x**  with  5  multi¬ 
plications  via  repeated  squaring,  smd  then  dividing  this  quantity  hy  x. 
Unfortunately,  the  availability  of  division  does  not  sd.ter  otur  lower  bound 
on  P(n),  and  hence  cannot  substsmtlsLlly  improve  on  the  eilgorlthms  we  have 
discussed. 

The  problem  of  finding  cm  optimal  computation  sequence  for  x  hem  a 
long  smd  interesting  history.  Although  Arnold  Scholz  formally  raised  the 
question  in  1937,  before  the  appearance  of  digital  computers^  algorithms  for 
computing  x”  had  been  studied  for  scase  time  earlier.  A  version  of  the  binsury 
method  was  expounded  by  the  famous  French  mathematician  Adrien  M.  Legendre  in 
179S,  and  it  is  closely  related  to  a  multiplication  procedure  used  by  Egyptian 
mathematicians  as  early  as  I8OO  B.C.  Several  authors  have  published  statements 
of  the  optlmcLLlty  of  the  method  but,  as  we  have  seen,  these  claims  are  false. 

We  note  in  closing  that  the  8^.gorlthms  studied  work  not  only  for  single  number, 
but  carry  over  to  the  problems  of  raising  polynomials  smd  matrices  to  a  power. 
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Eval\iatlon  of  Polynomials 

!nie  next  problem  we  consider  Is  that  of  evaluating  a  general  polynomial 

of  degree  n\  Such  a  pQi^omial  may  be  written  as  ^+...+c  x+g  . 

We  want  to  devise  cm  efficient  scheme  to  evaluate  any  such  polynomial  %dien 

given  the  values  of  the  coefficients  OjO  and  the  vaurlable  x  as 

n  n-i  1  0 

data. 

The  usual  method  for  evaluating  a  polynomial  is  to  apply  directly  the 

general  formula  given  above.  In  so  doing  we  first  calculate  each  of  the 

powers  of  x  using  n-1  multiplications.  Next  we  take  the  product  of  each 

coefficient  and  its  corresponding  power.  This  requires  another  n  multiplications. 

Finally  the  n+1  terms  in  the  canonical  expansion  are  summed  with  n  additions. 

Thus  a  total  of  2n-l  multiplications  and  n  additions  are  used  by  this  method. 

When  the  term-by-term  method  described  above  is  applied  to  the  degree 

two  polynomial  <?  x^+o  x+o  ,  three  multiplications  are  performed:  «•*,  o  'X*, 

2  10  2 

a^’X.  This  number  can  be  reduced  to  two  by  observing  that  x  can  be  factored 

out  of  the  first  two  terms,  yielding  the  formula  {o  x+o  )x+c  .  The  number  of 

2  1  0 

additions  remains  two. 

rails  insight  suggests  that  we  rearrange  degree  n  polynomials  as  p{x)= 

(...(<3  x*o  .)x*a  _)x+...+o  )x+<3  .  To  evaluate  the  polynomial  we  start  with 

n  n-1  n-2  i  o 

o  ,  multiply  by  x,  add  ,,  multiply  by  x,  add  o  multiply  by  x,...,  add 

a  .  rails  method  was  expounded  in  l8l9  by  William  G.  Horner  in  conjunction  with 

0 

an  efficient  technique  for  finding  the  coefficients  of  the  polynomial  p(x4a). 
Today  the  method  is  usually  referred  to  as  "Homer's  rule"  although  it  was 
actually  devised  by  Isaac  Newton  over  100  years  earlier  in  1711 • 

Homer’s  rule  employs  n  multiplication  and  n  addition  steps  to  evaluate 
polynomials  of  degree  n.  We  might  ask  whether  it  is  possible  to  do  better. 

The  answer  is  no  for  general  polynomials  in  which  all  of  the  coefficients  and 
the  variable  x  are  left  unspecified.  Of  course  particular  polynomials,  like 
the  one  examined  in  Figure  2,  can  be  evaluated  with  fewer  operations. 


We  can  readily  see  that  n  addition/subtraction  steps  are  required  becaxise 

any  scheme  for  evaluating  p{x)  clearly  works  when  ®*1  and  p(l)«c  +<?  +..,+g  . 

n  0 

This  Implies  that  Homer's  rule  can  be  adapted  to  find  the  sum  of  any  n-t-1 
numbers  by  letting  these  nmbers  play  the  roles  of  coefficients  and  setting 
n  to  1.  Since  n  add/sub  steps  are  required  to  sum  n-*-!  numbers  and  our  adap¬ 
tation  of  Homer's  znile  uses  exactly  this  many,  the  method  Is  optimal  with 
respect  to  the  number  of  add/ sub  steps.  A  demonstration  that  n  mult/dlv  steps 
are  also  required  Is  more  complex.  Since  such  a  proof  Is  based  upon  more 
advanced  concepts  of  linear  algebra,  ve  shall  omit  the  details  here. 

Eduard  G.  Belaga,  a  Russian  mathematician,  first  demonstrated  the 
necessity  of  n  add/sub  steps  in  195&<  Another  Russian,  Viktor  Pan,  shoved  In 
1966  that  n  mult/dlv  steps  are  also  required.  In  1971  Allan  Borodin  of  the 
University  of  Toronto  further  proved  that  Homer's  method  is  uniquely  optimal 
in  the  sense  that  It  is  the  only  way  to  evaluate  a  general  nth  degree  polynomial 
with  2n  arithmetic  operations. 

Next  ve  consider  the  problem  of  evaluating  an  nth  degree  polynomial  at 
several  points.  Applying  either  the  classical  algorithm  or  Horner's  method 
at,  say,  each  of  n  points  requires  a  number  of  operations  proportional  to  n*. 
Using  the  concept  of  a  "modular  transform",  we  shall  obtain  an  algorithm  whose 
performance  Is  only  slightly  worse  than  linear  in  n,  resulting  In  a  considerable 
speed-up  for  Increasingly  larger  values  of  n. 

Evaluating  a  polynomial  at  the  single  point  x^a  is  equivalent  to  finding 
the  remainder  when  p(x)  Is  divided  by  x-a.  This  follows  from  the  Resialnder 
Theorem  of  algebra,  since  we  can  write  p(x)»(x-a)q(x)+p(x)  where  <j(x)  and  r(x) 
are  the  quotient  and  remainder  polynomials,  respectively,  when  the  division  is 
performed.  Note  that  the  degree  of  q{x)  is  one  less  than  that  of  p(x)  and  r(x) 
is  a  constant.  Setting  x^  we  obtain  the  desired  result,  that  p{a)  is  equal  to 
the  constant  r. 
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This  technique  can  he  generalized  to  the  situation  In  which  we  wish  to 

evaluate  the  polynomial  p(«)  at  k  points,  a  where  k  <  degree  of  p. 

k  \  1  < 

We  first  form  the  product  n(x)«  II  (x-a.).  Again  from  the  Remainder  Theorem, 

t*l 

we  find  that  p(x)mi(x)q{x)*e(,x)  where  q(x)  and  r(x)  are  the  quotient  and 
remainder  polynomials  when  p(x)  Is  divided  by  m(x).  At  each  of  the  points 
x^.  the  value  of  m(a.)«0,  so  we  have  p(a.)*p(a.) .  Since  the  degree  of  r  Is 

tr  t*  %  X 

less  than  that  of  p,  we  have  reduced  our  problem  to  the  simpler  one  of 
evaluating  r(x)  at  the  k  points. 

A  common  property  of  many  fast  algorithms  Is  that  they  reduce  a  problem 
to  a  simpler  one  by  dividing  It  into  two  subproblems,  each  of  which  is  at  most  half 
as  difficult  as  the  original  problem.  Applying  this  principle,  a  fast  algo¬ 
rithm  for  the  problem  of  evaluating  an  nth  degree  polynomial  at  n+1  points 

suggests  Itself.  First,  divide  the  n+1  points  in  half  and  form  the  poly- 
n/2  n+1 

nomlals  m  (x)s=  II  (x~a.)  and  m  (x)=  II  (x-a.).  Analogous  to  what  we  did 
'  i»l  ^  *  V'n/2+1 

above,  we  next  divide  p(x)  by  m  (x)  to  get  r  (x)  and  m  (x)  to  get  r  (x).  We 

1  12  2  . 

have  now  reduced  our  original  problem  to  that  of  evaluating  the  two  n/2-th 

degree  polynomials  r  (x)  and  r  (x)  at  n/2+1  points.  To  do  this  we  apply  the 

1  2 

method  repeatedly. 

For  example,  suppose  we  wish  to  evaluate  the  polynomial  p(x)«x*-2x*+33P+l 

at  the  points  x=-l,o,l,2.  We  first  form  (x)»(x+l)x=x*+x  and  m^(x)»(x-l)(x-2)» 

x*-3x+2.  Dividing  p(x)  by  m  (x)  and  m  (x),  we  obtain  r  (x)*6x+l  and  r  (x)=l*x-l. 

12  1  '2 

Dividing  r^(x)  by  x+1  and  x,  we  find  from  the  remainders  that  p(-l)»-5  and 

p(0)»l.  Similarly  dividing  r  (x)  by  x-1  and  x-2,  we  get  that  p(l)*  3  and  p(2)«7. 

2 

The  tree  in  Figure  6  Illustrates  the  manner  in  which  the  products  m^(x), 

or  "moduli”,  are  built  up  in  general.  The  divisions  of  p(x)  and  the  subsequent 

remainders  are  computed  in  the  reverse  order.  If  the  products  are  formed 

moving  from  the  top  of  the  tree  downward,  and  then  the  divisions  are  performed 

going  from  the  bottom  of  the  tree  upward,  only  one  polynomi«a  multiplication 

and  one  polynomial  division  need  be  performed  for  each  node  In  the  tree. 
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It  turns  out  that  the  nvunber  of  scalar  arithmetic  operations  required 
to  either  multiply  or  divide  tvo  polynomials  of  like  degree  are  the  same  to 
within  a  multiplicative  constant.  Because  of  this,  the  ninning  time  of  our 
algorithm  for  evaluating  an  nth  degree  polynomial,  at  n-M  points  is  driven  by 
the  time  required  to  symbolically  multiply  two  polynomials  given  their  co¬ 
efficients.  In  fact  the  overall  running  time  of  the  modular  transform 
algorithm  described  above  is  Just  a  factor  of  log  n  greater  than  the  time 
required  to  multiply  two  polynomials  of  degree  n.  With  this  motivation  we 
now  turn  our  attention  to  the  polynomial  multiplication  problem. 


Polynomial  Multiplication 

In  the  polynomial  multiplication  problem,  we  are  given  two  polynomials 
represented  by  their  coefficients  as  data  and  are  asked  to  compute  the  co¬ 
efficient  representation  of  their  product.  Let  the  two  input  polynomials  be 
«-l  .  n-1  . 

p(x)-  T  a  Jf^and  q(x)e  T  (It  will  be  easier  to  work  with  polynomials 

i»0  ^  ^ 

having  n  coefficients  and  degree  »-l,  rather  than  those  of  degree  n.)  Ihelr 

k 

product  is  a  polynomial  of  degree  2n-2,  p(x)-q{x)*  I  o.x  ,  whose  coefficients 

k  k*0 

expressed  in  terms  of  the  Inputs  are  £?.»  J  a-  ^6..  Note  that  is  the  stan 

of  all  products  of  the  form  a.b .  in  which  i+a-k .  Thus  c=afc,eaa6+ab, 

^  0  00011001* 

a  =a  b  +a  b  +a  b  ,  etc. 

2  2  0  1  1  0  2 

The  classical  algorithm  for  polynomlcd.  multiplication  is  to  apply  the 
formula  given  above  to  conqmte  each  coefficient  in  the  result  directly.  In  so 
doing  sceQar  multiplications  are  used  since  every  product  of  the  form  a  ., 

If  g 

involving  one  of  the  n  coefficients  from  each  input  polynonlal,  is  formed 

exactly  once.  The  total  number  of  additions  made  is  equal  to  the  number  of 

<i^b j  pairs  less  the  number  of  coefficients  formed,  or  n^-(2n-l).  For  example, 

if  p{x)»a  x*a  .and  q{x)»b  x*h  ,  the  product  is  p{x)>q{x)»{a  b  )**+(a  i  +<2  b  )x 
10  10  111001 

-Kx  b  .  Here  n«2  and  ve  see  that  2^*^  multiplications  (viz.,  ab^ab^abt 
00  111001 

a  h  )  and  2‘-(2‘2-l)«l  addition  (viz.,  a  b  *a  b  )  are  performed. 

00  1001 
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It  turns  out  that  the  product  of  two  2-coefficient  polynomials  can  he 

found  with  3  multiplications,  instead  of  the  usual  U.  This  product  can  he 

expressed  in  terms  of  the  three  multiplications  m=a  ‘b  .  m  -ji  *6,  and 

too  211 

m  =(a  -w  )*(i  +b  )  as  p(«)"0{*)a»n  x*+(m  -m  -m  )x-m  ,  Although  this  scheme 

SIO  10  2  S21  1 

uses  U  add/suh  steps,  it  can  serve  as  the  basis  for  an  algorithm  to  multiply 
two  n-coefficient  polynomials  with  a  substantially  smaller  total  number  of 
operations  than  the  classical  algorithm  for  increasingly  larger  values  of  n. 

To  see  how  such  a  reduction  in  work  is  possible,  consider  the  case  when 
n*4.  The  classical  algorithm  uses  U*=l6  multiplications  to  find  the  product 
of  the  two  polynomials  p(,x)^  x*-*et  x*-hz  x+a  ando(x)=2>  x®4fc  x^-*b  x+b  . 

3210  3210 

Observe  that  we  can  split  these  polynomials  into  upper  and  lower  halves,  ex¬ 
pressing  them  as  p(x)=8(x)x*+t(x)  and  <j’(x)*«(x)x*+w(x)  where  8(x)®  a  x-*a  , 

3  2 

t(x)»a  x+a  ,  u(x)^  x-*b  ,  and  y(x)»4  x-*b  .  In  this  form  the  product 
1  0  3  2  1  0 

pix)>q(x  )"8ux^  -t^av+tu  )x^*tv. 

If  we  take  each  of  these  U  subproducts  of  2-coefficient  polynomials  in 
the  classical  way  we  use  UfU=l6  multiplications.  But  instead  we  can  take 
advantage  of  our  3  multiplication  scheme  by  forming  the  products 

and  m^s{B+t)’{u-w).  Since  each  of  these  3  products  of  2-coefficient 
polynomials  can  be  found  by  a  repeated  application  of  the  3  multiplication 
scheme,  only  3*3*9  sca^-au-  multiplications  aure  used.  The  desired  result  is 
p{x)‘q{x)mi  x^ -tim  -m  -m  )xvn  .  The  details  of  this  scheme  are  illustrated  in 

2  3  2  1  I 

Figure  11. 

S6w  let  us  generaillze  to  the  ease  of  arbitrary  size  polynomlaLls .  For 
simplicity  we  will  awsume  that  the  number  of  coefficients,  n,  is  a  power  of 
two  although  a  similar  result  can  be  derived  for  amy  vailue  of  n.  We  begin  by 
dividing  the  coefficients  of  the  Inputs  into  upper  and  lower  halves,  expressing 
these  polynoodals  as  p(x)*e(x)x^^®+t(x)  and  <7{x)*u(x)x”^^+w(x) .  We  next  form 
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the  following  products  of  n/2-coefficient  polynomials:  n  •t'V,  m  •S'Ut  and 

1  2 

m^a(a*t)'{u*o).  In  taking  each  of  these  products,  we  will  apply  the  algorithm 

recursively  (i.e.,  repeatedly)  until  n«l.  The  desired  product  p(x)’q(x)  is 

ultimately  formed  as  m  -m  -m  . 

a  3  2  1  1 

Ihe  approach  to  the  polynomial  multiplication  problem  described  above  is 

cm  exsmple  of  what  Is  known  as  a  "divide^and-conquer"  algorithm,  ^le  idea 

behind  this  common  algorithm  design  technique  is  to  split  a  problen  into  a 

number  of  subproblems  of  the  same  kind  involving  disjoint  subsets  of  the 

inputs.  The  subproblems  are  solved  separately  by  reapplying  the  divide:-and- 

conquer  strategy,  and  then  a  method  is  found  to  combine  the  solutions  of 

these  subproblems  into  a  solution  of  the  whole  problem.  Frequently,  as  in 

the  case  of  polynomial  multiplication,  a  divide-cmd-conquer  approach  can 

lead  to  a  more  efficient  algorithm  than  a  direct  attack  on  the  original 

problem.  We  shall  see  another  application  of  this  paradigm  later. 

We  now  investigate  the  efficiency  of  the  divide-and-conquer  polynomial 

multiplication  algorithm.  Let  M{ri)  denote  the  number  of  scaQ.cur  multiplications 

performed  in  taking  the  product  of  two  n-coefficient  polynomials.  Since  the 

scalao*  multiplications  made  in  forming  the  product  of  the  two  original. 

polynomials  are  exactly  those  used  to  compute  the  resulting  3  products  of 

M/2-coefficieat  polynomlaC.8 ,  we  have  that  W(n)=3W(n/2) .  This  eqxiation,  cadled 

a  recurrence  relation,  can  be  solved  by  back^substitutlon  as  follows; 

W(n)=3W(n/2)=3^W<'n/4)*. . .  =3^wrn/2?^).  The  process  stops  irtien  2^»n  or  fe*log  n, 

2 

at  which  point  we  use  the  inltlaJ.  condition  Af{l)*l  to  obtain 

(Af(l)=l  since  one  multiplication  is  used  to  find  the  product  of  two  1-coefficient 

polynomials,  a  b  .)  Because  log  3*1. 59 >  the  divide-and-conquer  aLLgorithm  uses 
1.5*  *  *  *  , 

n  scalar  multiplications  to  the  classicail  aU.gorlthm's  tr . 
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We  next  obtain  a  recurrence  relation  for  the  number  of  scalar  add/sub 
steps,  4(n).  The  additions  performed  when  multiplying  tvo  n-coefflclent 
polynomials  arise  from  three  sources:  (l)  recursively  applying  the  algorithm 
to  find  3  products  of  n/2«eoefflclent  polynomials,  (2)  performing  2  additions 
of  n/2-coefflclent  polynomials  (viz.,  «+t,  u+w)and  2  subtractions  of  (n-l)- 
coefflclent  polynomials  (via,,  m  ~m  -m  ),  and  (3)  fonilng  the  coefficients 

3  2  1 

of  the  original  product  from  the  results  of  recursively  applying  the  procediare 

(e.g. ,  the  additions  d  +d  and  d  *d  In  the  third  step  of  Figure  11). 

2*0  ut  1*0  0*2 

By  definition,  there  are  34(n/2)  additions  from  the  first  source.  The  sum  of 
tvo  polynomials  Is  found  by  merely  adding  their  corresponding  coefficients, 
so  the  second  source  contributes  2(n/2)+2(n-l)*3»“2  additions.  The  third 
source  generates  n-2  additions.  Hence  the  desired  recurrence  relation  Is 
A{n)=3A.{n/2)+kn-hy  whose  solution  with  initial  condition  j4(1)=0  Is  i4(n)* 

We  have  Just  shown  that  both  the  number  of  multiplications  and  add/ sub 
steps  in  the  divide-and-conquer  algorithm  grow  proportionally  to  This 

can  represent  a  substsmtlal  IsH^rovement  over  the  classical  algorithm,  where 
the  nuBiber  of  operations  grows  as  n^.  When  n»8  the  total  number  of  arithmetic 
operations  performed  by  both  algorithms  are  comparable:  127  for  divlde-and- 
conquer  to  113  for  the  classical  method.  For  larger  values  of  n  the  dlvide- 
and-conquer  method  Is  superior. 

Is  this  the  fastest  that  two  polynomials  can  be  multiplied?  The  method 
Just  described  Is  based  on  the  fact  that  the  product  of  2-coofficlent  poly¬ 
nomials  can  be  found  with  3  multiplications.  It  yields  a  general  algorithm 
for  n-coefficlent  polynomials  in  which  the  number  of  scalar  arithmetic  oper¬ 
ations  performed  grows  as  Using  a  divlde-and»conquer  approach.  It 

Is  possible  to  convert  «uiy  sch«ne  for  computing  the  product  of  two  polynomials 
of  some  specific  else  m  with  p  multiplications  Into  a  method  for  multiplying 


I 

i 

i 

I 
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arbitrarily  large  n-coefflcient  polynomials  using  operations. 

(The  (7-notation  denotes  the  order  of  magnitude  of  a  function’s  growth  rate. 
Ignoring  any  constant  factors  of  proportionality . )  It  m  and  p  are  such  that 
log^  <  1.59,  the  resulting  algorithm  will  be  asymptotically  better  than 
the  one  we  have  discussed.  To  date  no  one  has  been  able  to  produce  a  faster 
algorithm  using  such  a  strategy. 


Algebraic  Transforms 

One  way  to  specify  a  polynomial  is  to  give  its  coefficients.  This  is 
the  only  representation  we  have  been  working  with  up  to  now.  A  well-known 
result  in  algebra  states  that  there  is  a  unique  polynomial  of  degree  less 
than  n  which  will  fit  through  any  n  points.  Thus  an  alternate  representation 
for  a  polynomial  is  to  give  its  values  at  n  points. 

The  product  of  two  n-coefficient  polynomials  is  a  polynomial  wjth  2n-l 

coefficients.  Such  a  polynomial  can  be  uniquely  represented  by  its  value  at 

2n-l  points.  This  suggests  a  new  method  for  multiplying  the  n-coefficient 

polynomials  p(x)  emd  q(x) .  We  begin  by  evaluating  both  p(x)  and  q(x)  at 

2n-l  selected  points ,  x«a  ,a  .... ,a~  , .  We  next  multiply  together  the 

1  2  2n-l 

corresponding  values  of  the  polynomials  at  these  points,  forming  2n-l  products 
p{a.)  q[a.).  The  polynomial  which  uniquely  fits  these  2n-l  values  is  the 
desired  product  p{x)'q{x) , 

This  approach  to  multiplying  two  polynomials  is  called  an  algebraic 
transform.  Instead  of  dealing  directly  with  the  coefficients  of  p(x)  and  ^^(x), 
as  we  have  done  previously,  we  first  transform  the  coefficient  representation 
of  p  and  q  into  another  form,  one  in  which  the  polynomials  are  represented  by 
their  values  at  a  collection  of  points.  We  perform  the  actual  multiplication 
on  this  second  representation  by  taking  the  pairwise  products  of  the  vcd.ues  of 
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p  and  q  at  the  sednple  points.  We  nov  have  a  representation  of  the  product 
polynomial  In  the  form  of  Its  value  at  a  number  of  points,  and  must  perform 
an  Inverse  transformation  to  obtain  the  coefficient  representation  of  the 
resiilt.  This  final  step  Is  called  Interpolation.  The  entire  process  Is 
Illustrated  In  Figure  12. 

In  a  previous  section  ve  considered  the  problem  of  evaluating  a  polynomial 
at  several  €u:bltrarlly  selected  points.  The  performance  of  the  algorithm  we 
described  Is  asymptotically  the  best  of  any  method  known  to  date.  This 
algorithm  uses  a  number  of  basic  arithmetic  operations  proportional  to 
nClog  n)*  when  evaluating  an  w-coefflclent  polynomial  at  2n-l  points.  Similarly, 
the  best  known  algorithm  for  Interpolating  a  polynomial  through  2n-l  arbitrary 
points  also  performs  a  number  of  basic  arithmetic  operations  proportional  to 
nOog  n)^.  (The  more  familiar  classical  Interpolation  algorithms  of  Isaac 
Newton  and  the  noted  French  mathematician  Count  Joseph  Louis  Lagrange  employ 
a  number  of  operations  growing  as  n^.)  Thus  the  number  of  operations  used 
In  the  transform  method  for  multiplying  two  polynomials  Is  dominated  by  the 
evaluation  and  Interpolation  steps,  rather  than  the  2n-l  palrwls  multiplications, 
and  Is  (7(n(Log  n)‘). 

In  the  trcusform  Just  described,  the  values  of  x  where  the  evaluation  and 
Interpolation  take  place  €u:e  arbitrarily  chosen.  It  turns  out  that  a  Judicious 
choice  of  points  can  lead  to  a  slightly  Improved  algorithm.  Observe  that  the 
polynomial  p(x)«a^+a^x+. . .+a 

powered  terms:  p(x)»(a  +a  x*+,.,+a^  x+a  x*+...+a_  ,x”“^).  Substl- 

tutlng  yv*,  we  have  p(x)*(a^+a^y+. . 

e(y)’Wt{y).  Thus  the  problem  of  evaluating  the  n-coefflclept  polynomial  p(x) 
reduces  to  the  problem  of  evaluating  two  polynomials  a{y)  and  tiy)^  of  half 
that  size, plus  three  additional  operations:  y"x^t  x-t{y)t  8(y)*xt{y).  How¬ 
ever,  we  are  still  faced  with  the  task  of  evaluating  both  a  and  t  at  the  same 
number  of  points,  and  no  redxietion  in  the  number  of  operations  has  occurred  yet. 


^  jp  can  be  broken  into  a  sm  of  odd  and  even 


I 


When  the  points  where  the  evaluation  and  Interpolation  take  place  are 
chosen  to  he  the  primitive  n-th  roots  of  the  equation  the  process  can 

he  speeded.  If  u  is  one  of  these  primitive  n-th  roots  of  unity,  then  o)” *1 
and  u  j*l  for  all  k<n.  Moreover,  if  n  la  even  then  oi*  is  a  primitive  n/2- th 
root  of  1  since  a)”»((o*)”^^*l.  Furthermore,  a)”^^»-l  which  is  easily  verified 
hy  (-l)^=(U)”^^)^au”al. 

We  now  return  to  the  problem  of  evaluating  otir  odd  and  even  s\im  polynomial 
p{x)=8(y)-Hct(y) ,  where  y^^ t  hut  at  the  n  distinct  points  x^  for  0<j<n-l. 
Then  p(a/)=8(a>^*)+a/t(u^*)  and  ,  since 

and  .  l^ese  formulas  reveal  how  the  problem 

of  evaluating  the  polynomial  p{x)  at  n  points  can  he  divided  into  two  sub- 
problems  idilch  Involve  evaluating  polynomials  of  half  the  original  size  at 
half  as  many  points.  The  suhprohlems  eire  the  evaluation  of  8  and  t,  both 
having  n/2  coefficients,  at  the  points  for  0<j*<n/2-l,  the  primitive 

n/2-th  roots  of  unity. 

This  strategy  for  splitting  the  problem  can  be  applied  repeatedly  until 
we  eventiially  aurlve  at  the  trivial  problem  of  evaluating  a  constant  poly¬ 
nomial.  The  total  number  of  scalar  arithmetic  operations  performed  is 
governed  by  the  recurrence  relation  2’(n)»22’(n/2)+on,  where  the  last  term 
represents  one  addition  cuid  one  multiplication  for  each  point  x. .  (The 

V 

number  of  multiplications  can  be  cut  in  half  by  realizing  that 

Since  the  roots  of  unity  are  in  general  ccmplez  numbers,  several  scalar 

operations  will  be  needed  for  each  arithmetic  operation  as  written.  The 

solution  to  the  recurrence,  with  boundary  condition  7(1 )*0,  is  given  by 

7(n)*on  log  n  for  n  a  power  of  two. 

2 

The  algorithm  described  yields  cm  0(n  log  n). algorithm  for  polynosilal 
multiplication,  the  asyiq>toticalIy  best  method  known.  It  is  still  an  open 
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question  whether  the  technique  is  optimal  since  the  best  lover  bounds  to  date 
are  of  order  n.  Such  lover  bounds  are  not  surprising  since  solving  the 
problem  involves  processing  2n  inputs,  each  of  which  must  be  used  at  least 
once. 

The  algebraic  transform  serving  as  the  basis  of  the  algorithm  is  the 
well-known  fast  Fourier  transform,  or  FFT.  The  FPT  traces  its  origins  to  the 
G«nnan  mathematicians  Carl  Runge  and  H.  Konlg  in  the  1920' s.  G.  C.  Danielson 
and  Cornelius  Lanczos  (19*^2)  and  Irving  J.  Good  (1958)  were  other  early 
contributors.  A  fvmdamental  paper  by  James  W.  Cooley  and  John  W.  Tukey  in 
1965  clarified  the  technique  and  led  to  its  widespread  use.  The  recursive 
foimulation  of  the  algorithm  described  here  is  due  to  Allan  Borodin  and  Ian 
Munro.  Several  other  researchers,  including  Charles  M.  Flduccia,  Ellis 
Horowitz,  John  D.  Lipson  and  Robert  Moenck,  have  also  made  substantial  con¬ 
tributions  in  the  ewea  to  provide  an  interesting  and  coherent  view  of  the 
relationship  between  evalmtlon,  interpolation,  and  modular  arithmetic.  The 
FFT,  itself,  is  utilized  in  many  fields  of  science  and  engineering,  perhaps 
most  notably  in  signal  processing  applications  such  as  communications,  and 
speech  and  image  processing. 

Polynomlel  multiplication  finds  a  useful  analog  in  the  problem  of  forming 
the  product  of  two  n-dlgit  numbers.  In  fact  the  divlde-and-conquer  polynomisQ. 
multiplication  algorithm  using  operations,  which  we  considered  earlier, 

is  based  on  a  technique  described  by  the  Russian  mathematicians  A.  Karatsuba 
and  Yu.  OAnan  in  1962  for  the  digit  product  problem.  In  1971  the  Germw 
mathematicians  Arnold  Schonhage  and  Volker  Strassen  applied  the  FFT  to  produce 
an  algorithm  using  0{n  log  n  log  log  n)  dlgltwlse  operations  to  multiply  two 
n-dlgit  numbers. 
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Matrix  Multiplication 


One  final  prol)lem  we  will  investigate  is  that  of  multiplying  two  square 
matrices.  This  problem  and  several  related  operations  arise  frequently  in 
scientific  applications  of  computers.  An  efficient  algorithm  for  matrix 
multiplication  can  be  used,  for  example,  to  obtain  fast  algorithms  for  in¬ 
verting  a  matrix,  finding  its  determinant,  and  solving  systems  of  simulta¬ 
neous  linear  equations. 

Suppose  we  are  given  two  n  x  n  matrices  A  and  B.  We  will  denote  the 

elements  of  each  of  these  matrices  by  a. .  and  6.  .,  where  i  suid  j  range 

between  1  and  n.  The  result  of  multiplying  these  two  matrices  together  is 

another  n  x  n  matrix  CsA'B^  whose  entries  are  given  by  the  formula 
n 

^  that  i  and  j  remain  constant  in  the  sum,  while  k 

ranges  over  all  n  values. 

The  standard  method  for  miiltiplying  two  matrices  is  to  apply  directly  the 
above  formula  n*  times.  Since  the  product  is  used  in  the  computation 

of  exactly  one  entry,  o..,  no  overlapping  of  operations  is  possible.  Observe 
that  n  multiplications  emd  n-1  additions  are  used  to  calc\ilate  each  entry, 
and  thus  a  total  of  multiplications  and  ?i^(n-l)=n*-n^  additions  are 

used  overall. 

Thus  the  standard  algorithm  uses  8  multiplications  and  U  additions  to 
compute  the  product  of  two  2x2  matrices.  In  1969  Volker  Strassen  of  the 
University  of  Zurich  showed,  svirprl singly,  that  only  7  multiplications  were 
required.  Strassen 's  scheme,  which  is  given  in  Figure  lU,  trades  one  multi¬ 
plication  at  the  expense  of  lU  extra  add/ sub  steps.  The  key  point,  however, 
is  that  the  method  does  not  make  use  of  the  commutativity  of  multiplication, 
and  hence  can  be  used  as  the  basis  of  a  divide-and-conquer  algorithm  for 
multiplying  larger  size  matrices. 
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For  «zaiQile,  suppose  ve  want  to  find  the  pro^hict  of  tvo  U  x  V  aatrices 
A  and  B.  Each  of  these  matrices  can  he  partitioned  Into  foar  2x2  suhnatrlces, 
as  Illustrated  In  Figure  1^,  Thus  ve  can  regard  both  A  and  B  as  2  x  2  matrices 
\rtiose  entries  are  theaiselves  2x2  matrices.  Applying  Strassen's  algorithm 
recursively  to  obtain  the  product  OA-B,  we  first  form  7  laroducts  of  2  x  2 
matrices.  Since  each  of  these  products  can  be  calculated  with  7  scalar  multl- 
plicatibns,  7*7»^9  multiplications  are  used  overall.  This  represents  a  con¬ 
siderable  improvement  over  the  multiplications  employed  in  the  standard 

algorithm! 

Let  us  now  examine  hov  Strassen's  scheme  can  be  applied  to  obtain  a  fast 
method  for  multiplying  two  square  matrices  of  any  sise  n.  For  simplicity  ve 
will  assume  that  n  is  a  power  of  tvo,  although  this  restriction  is  not  essential. 
To  multiply  two  nun  matrices,  we  first  partition  both  of  the  matrices  into 
fOTir  n/2  X  n/2  submatrlces.  The  product  of  the  origined  n  x  n  matrices  can 
be  formed  using  Strassen's  scheme  by  computing  the  product  of  7  square  matrices 
of  size  n/2.  To  find  these  products  ve  can  apply  the  techni^e  once  again. 

We  now  examine  the  efficiency  of  Strassen's  algorithm.  Let  M{n)  denote 
the  nimber  of  scalar  multiplications  used  in  coxputing  the  product  of  two 
n  X  n  matrices.  Since  this  product  can  be  reduced  to  the  laroblem  of  forming 
7  products  of  n/2  x  n/2  matrices,  we  have  Min)*lM{n/2) .  The  solution  to  this 
recurrence  relation,  with  initial  condition  i>f(l)*l,  is  This 

result  is  easily  obtained  by  back-substitution;  ^/{n)«7Mw/2)«7*iV(n/U)». . .« 

Since  log  7*2. 8l,  we  have  that  Strassen's  algorithm 
2 

uses  n*‘**  multiplications,  instead  of  the  usual  n*,  to  multiply  two  n  x  n 
matrices . 

What  about  the  number  of  add/sub  steps?  For  the  2x2  case,  the  standard 
method  uses  only  additions,  while  Strassen’s  schesie  employs  18.  Robert  L. 
Probert  of  the  University  of  Saskatchewan  showed  in  1973  how  to  reduce  this 
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number  to  15  moreov^,  proved  that  15  addltione  vere  necsasary  in  aigr 
scheme  using  only  7  mnltipllcations .  It  appears  that  although  Strassen's 
method  may  save  on  multipllcationa,  this  savings  vtll  be  more  than  offset  by 
the  extra  cost  of  the  seeningly  liqrge  nuhbar  of  additions.  Surprisingly,  ve 
shall  see  that  for  sufficiently  large  matrices  the  number  of  additions  is 
actually  reduced! 

Let  A[n)  denote  the  number  of  scalar  additions  performed  when  Strassen's 
method  is  used  to  multiply  two  n  x  n  matrices.  Examination  of  the  algorlttam 
reveals  that  this  quantity  is  equal  to  the  number  of  additions  performed  in 
multiplying  7  matrices  of  size  n/2  plus  the  scalar  additions  used  in  forming 
a  (l8  or  1^)  sums  of  n/2  x  n/2  matrices.  When  adding  tvo  matrices  ve  merely 
add  the  corresponding  pairs  of  elements  using  scalar  additions.  Hence  the 
recurrence  relation  describing  the  number  of  ciddltions  is  d(n)»7il(n/2)+a(n/2)*. 
The  solution  to  this  equation,  with  initial  condition  d(l)»0,  is  4(n)^/3* 

-«*). 

We  have  Just  seen  that  both  the  number  of  multiplications  and  the  number 
of  additions  performed  by  Strassen's  algorithm  are  proportional  to 
For  sufficiently  large  values  of  n,  the  value  of  any  function  proportional  to 
vlll  be  less  than  one  grovlng  as  n’.  Hence  Strassen's  algorithm  is 
asymptotically  faster  than  the  standard  one.  But  vhen  does  it  begin  to  pay 
to  use  Strassen's  algorithm?  Jacques  Cohen  and  Martin  Roth  at  Brandels 
University  have  shown  that  the  crossover  point  is  at  about  n«U0.  ^Qielr  results 
are  based  on  timing  experiments  on  an  actual  computer  vhlch  take  into  account 
the  added  overhead  Incurred  by  more  complex  accessing  of  the  data  as  veil  as 
the  number  of  arithmetic  operations  performed. 

The  dlvlde-and-conquer  approach  underlying  Strassen's  algorithm  might  be 
used  to  generate  even  faster  matrix  multiplication  algomlthms.  Any  non- 
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commutative  scheme  for  finding  the  product  of  two  m  x  m  matrices  with  p 
multiplications  can  serve  as  the  basis  of  a  method  for  multiplying  matrices 
of  arbitrary  size  n  with  operations.  Thus  if  we  could  show  now 

to  multiply  2x2  matrices  with  6  scalar  multiplications,  we  could  take  the 
product  of  arbitrary  size  matrices  with  0(n^°®a^)=sO(n* * *• )  operations. 
Unfortunately,  John  E.  Hopcroft  and  Leslie  R.  Kerr  at  Coraell  University 
and  Shmuel  Vlnograd  of  IBM  showed  independently  in  1971  that  this  is  In^sslble. 
To  date  the  best  method  for  the  3x3  case  uses  23  multiplications,  while  21 
would  be  needed  to  better  Strassen's  result. 

Viktor  Pan,  working  at  the  IBM  Thomas  J.  Watson  Research  Center,  has  taken 
an  entirely  different  approach  to  the  matrix  multiplication  problem.  In  1979 
he  exhibited  an  algorithm  using  only  0(n***M  operations,  but  with  such  a 
serious  Increase  in  the  constant  of  proportionality  that  the  method  would  be 
impractical  to  implement.  To  date  all  that  is  known  is  that  at  least  on  the 
order  of  n*  operations  are  needed.  This  is  not  surprising  in  view  of  the  fact 
that  the  input  consists  of  2n*  matrix  elensnts,  eind  all  of  the  data  must  be 
used  at  least  once.  Researchers  are  activoly  trying  to  bridge  the  gap  between 
the  best  upper  and  lower  bounds  for  this  ^oblem. 


S-2S 


Bibliography 

The  Art  of  Coiaputer  Prograamlng  (Vol.  2.  Semlp\aiierical  Algorithms).  Donald  E. 
Knuth.  AddlVon-Vesley  Publishing  Co . ,  1969 . 

The  Design  and  Analysis  of  Computer  Algorithms.  Alfred  V.  Aho,  John  E.  Hopcroft 
and  Jeffrey  D.  Ullaan.  Addiaon-Wesley  Publishing  Co.,  197^. 

The  Computational  Complexity  of  Algebraic  and  Numeric  Problems.  Allan  Borodin 
and  Ian  Hunro.  American  Elsevier  Publishing  Co.,  1973 > 

Pundfunentals  of  Computa*  AlgorltbmB.  Ellis  Horowitz  and  SartaJ  Sahni.  Computer 
Science  Press,  1978. 


List  of  Figures 


1.  CoB^lex  Bultipllcatlon. 

2.  Straight-line  algorithms. 

3.  Comparison  of  severtil  methods  for  computing  x”. 

U.  Optimal  pover  tree  for  x”. 

5-  Polynomial  evaluation  via  usual  and  Horner's  methods. 

6.  Modular  method  for  evaluating  a  polynomial  at  several  points. 

7.  Polyncnaial  addition  and  subtraction. 

8.  PolynomleLl  multiplication. 

9*  Synthetic  division  of  polynomials. 

10.  Product  of  first  degree  polynomials  via  usual  and  three  multiplication  methods. 

11.  Divide-and-conquer  algorithm  for  polynomial  multiplication. 

12.  Algebraic  transform  for  polynomial  multiplication. 

13.  Complex  roots  of  unity. 

ll^.  Strassen's. algorithm  for  multiplying  2x2  matrices. 

15 •  Dlvlde-and-conquer  adaptation  of  Strassen's  cd.gorlthm. 


3-25 


Input  data 


Dl.  a  D2.  b  D3.  a 

Computation  steps 

51.  Dl  X  D3  *  oc 

52.  D2  X  DU  *  id 

53.  Dl  X  DU  «  od 
SU.  D2  X  D3  =  ie 

55.  SI  -  S2  =  aa-bd  (real) 

56.  S3  ♦  SU  =  ad*bo  (imaginary) 
Classical  method 


DU.  d 

51.  Dl  +  D2  =  a+i 

52.  D3  +  DU  *  e*d 

53.  SI  X  S2  *  aaMid+ba*bd 
SU.  Dl  X  D3  ■  a<s 

55.  D2  X  DU  »  id 

56.  sU  -  S5  ■  ao-bd  (real) 

57.  S3  -  SU  *  ad*bo*bd 

58.  S7  -  S5  »  ad*be  (imaginary) 
Three  multiplication  method 


Figure  1.  Tiro  methods  for  forming  the  product  of  the  complex  numbers 
(a+it)(c+dt)*(ao-id)+(ad+ic?)t.  Tlie  classical  method  uses  U  multiplications 
and  2  additions/subtractions,  while  the  problem  can  cQso  be  solved  with  3 
multiplications  and  3  additions.  If  M  and  A  denote  the  time  required  to 
perform  a  single  multiplication  and  addition,  respectively,  the  second  method 
is  faster  if  3U*-3A<^M+2A ,  or  M/A>3. 
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Input  data 


Input  data 


Dl. 

X 

Dl. 

X 

Constants 

Constants 

Cl. 

2 

Cl. 

1 

C2. 

U 

C2. 

2 

C3. 

5 

Computation  steps 

Computation 

steps 

SI. 

Dl  X  Dl  * 

X* 

SI. 

Dl 

+  Cl  » 

x+1 

S2. 

Dl  X  SI  * 

X* 

S2. 

Dl 

+  C2  * 

x+2 

S3. 

C2  X  SI  a 

Ux* 

S3. 

SI 

X  SI  = 

x*+2x+l 

su. 

C3  X  Dl  * 

5x 

Si». 

S2 

X  S3  *■ 

x*+Ux*+5x+2 

55.  S2  +  S3  *  x*+Ux* 

56.  S5  +  SU  »  x*+Ux*+5« 

57.  S6  +  Cl  *  x*+l4x*+5x+2 

Figure  2.  A  stralght-llne  algorithm  consists  of  a  series  of  computation 
steps  In  which  an  arithmetic  operation  Is  applied  to  either  the  Input  data, 
constants,  or  the  results  of  prior  computation  steps.  Two  algorlthsw  for 
computing  p(x)^^*Ux^+3x+2  are  shown.  The  first  applies  the  formula  directly 
using  U  multlpllcatlona  ahd  3  additions.  The  second,  which  takes  advantage 
of  the  fact  that  p(x)  can  be  fcustored  as  p(x)«(x4-l)^(x+2) ,  uses  only  2 
multiplications  and  2  additions. 
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Input  aata 


Dl.  X 


Computation  steps 


SI. 

Dl  X 
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Figure  3.  Compeurison  of  several  eJ.gorithns  for  The  hrute  force  method  uses  22 

multiplications,  the  binary  and  factor  methods  T,  and  the  power  tree  only  6.  In  the 

binary  method,  23*(1011l)  gives  rise  to  the  computation  sequence  SSXSXSX  where  S 

2 

means  "square  the  result  of  the  previous  step"  and  X  means  "multiply  the  resTilt  of 

the  previous  step  by  X".  In  the  factor  method,  x**?a**'X“(x*)^* 'X;  letting  | 

)*.y.  letting  s®j/* ,  s®»(s*)**s.  I 


I 


Input  (Sata 
Dl.  X 


I 


D2.  a  D3.  o  Dk.  o 
0  12 


D5.  o 

1 


Computation 

steps 

SI. 

Dl 

X  Dl  -  X*] 

1  compute 

SI. 

S2. 

SI 

X  Dl  *  x*J 

1  powers 

• 

CM 

CO 

S3. 

D3 

X  Dl  •  o  X 

1 

multiply  hy 

S3. 

SU. 

Dl* 

X  SI  =  o  X* 
2 

coefficients 

CO 

S5. 

D5 

X  S2  =  e  X* 

S5. 

*  J 


56.  S5  +  SU  =  o  x*+e  X* 

3  2 

57.  S6  +  S3  =  <?  x*+cj  x^-w  X 

3  2  i 

58.  S7  +  D2  *  o  x*+e  x*+<3  x+o 

3  2  10 


Usual  method 


S6. 
sun 
>  terms 


D5  *  Dl  ■  <3  X 
3 

51  +  d1*  *  e  X  +0 

3  2 

52  X  Dl  »  e  x*+e  x 

3  2 

53  +  D3  «  o  x*+c  x+ff 

3  2  1 

SU  X  Dl  *  c  x*+o  x*+o  X 
3  2  1 

S5  +  D2  *  e  x*+e  x*+e  x+o 
8  2  10 


Horner's  method 


Figure  5.  Tvro  methods  for  evaluating  a  general  third  degree  polynomial 

p(x)»o  x*+e  x*+c  x+c  .  The  usual  method  is  to  first  compute  the  powers  of 
*^3210 

x:  X*,  X*;  then  multiply  the  powers  hy  the  appropriate  coefficients: 
o  X,  o  x*,c  X*;  and  finally  to  sum  the  terms.  Homer's  method  uses  repeated 

12  3 

factoring  to  evaluate  p(x)  as  ((e  x+o  )x+o  )x+c  .  When  evaluating  an  nth 

3  2  10 

degree  polynomial the  usual  method  performs  2n-l  multiplications  and  n 
additions,  while  Horner's  method  employs  only  n  operations  of  each  type. 
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Figure  6.  showing  the  products  and  remainders  to  he  computed  when  evaluating 

a  polynomial  p(x)  of  degree  >  3  at  the  U  points  x*a  ,a  ,a  ,a  using  the  modular 

“  1  2  3  >* 

transform  method.  First  the  products  called  "moduli",  are  huilt  up  in  the 
manner  shown  moving  from  the  top  of  the  tree  downward.  Then  the  divisions  of 
p(x)  and  the  subsequent  remainders  indicated  are  computed  in  the  reverse  order, 
going  up  the  tree.  (r>  8  mod  m  means  that  r  is  the  remainder  when  a  is  divided 
by  m. )  The  overall  running  time  of  the  algorithm  is  driven  by  the  time  to 
symbolically  multiply  and  divide  polynomials. 
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Figtire  7.  Polynomial  addition  and  subtraction.  Two  polynomials  are  added 
(or  subtracted)  by  adding  (or  subtracting)  the  coefficients  of  the  corresponding 
powers  of  the  variable  x.  The  sum  and  difference  of  5x*+2x*+3  and  2x*-6x-l  are 
shown. 
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(2**  times  5«*+2e*+3) 


10Bff*+lix*-30B:'-llx*-2x*-l8i-3  product 


Figure  8.  Polynomial  m\xltipllcation.  The  classical  way  to  take  the  product 
of  two  polynomicds  is  to  multiply  each  term  in  the  multiplier  by  the  multiplicand 
and  then  sum  the  results.  The  product  of  5®*+2»*+3  and  2**-6x-l  Is  illustrated. 
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-3**+  x-2  (-1  times  3x*-x+2) 


remainder  Ua?+5 


(-3x*+5ac+3  less  -3x*+x-2) 


Figure  9.  Synthetic  division  of  polynomials  is  similar  to  long  division  of 
two  integers.  First  the  high  order  terms  of  the  dividend  and  divisor  are 
divided,  this  result  is  multiplied  hy  the  entire  divisor,  and  the  resultant 
product  is  subtracted  from  the  entire  dividend  to  yield  a  trial  remainder. 

(In  the  exjnple  shown,  3»*  into  6x*  is  2x,  2x  times  3x*-x+2  is  6x*-2x*+l»x, 
and  6x*-5«*+9*+3  less  6x*-2x*+l(x  yields  a  trial  remainder  of  -3a?*+5®+3.) 

The  entire  process  is  then  repeated  with  the  trial  remainder  in  the  role  that 
the  dividend  played  initially.  (3x*  into  -3x*  is  -1,  -1  times  3x*-x+2  is 
-3»*+a>2,  -3x^+5x+3  less  -3a;^+x-2  is  JMf+5-)  The  procedure  continues  until  a 
trial  remainder  of  degree  less  than  the  divisor  is  obtained. 
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Figure  10.  Tvo  algorithms  for  multiplying  first  degree  (i.e.,  2-coeff Iclent ) 
polynomials,  ^e  usual  method  uses  U  sced.ar  multiplications,  while  the  product 
can  he  formed  with  only  3  scalar  multiplications  by  using  eictra  additions  and 
subtractions . 
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for  a  total  of  3*3”9  scalar  multiplications. 
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Figure  11.  Divide-and-conquer  polynomial  multiplication  algorithm  applied  to  two 
general  U-coefflcient( degree  3)  polynomials  p(x)  andq(x).  Because  two  2-coefflclent 
polynomials  can  be  multiplied  taking  3  products  Instead  of  the  usual  U ,  p (x )  and  q  (x ) 
com  be  split  into  two  2-coefflcient  polynomials  and  multiplied  with  3*3^9  scaJ.ar 
products.  The  classical  method  would  have  en^loyed  l6  products.  In  general  the 
dlvlde-and-conq\]er  approach  leads  to  an  algorithm  using  arithmetic 

operations  overall,  while  the  usual  method  is  O(n^). 
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Figure  12.  Algebraic  transform  for  the  product  of  two  n-coefficient  polynomials 
p{x)’q(x).  The  classical  algorithm  obtains  the  coefficients  of  the  product 
polynomial  directly  using  conventional  arithmetic.  In  the  transform  method 
p{x)  and  q(x)  are  both  evaluated  at  2n-l  points,  and  their  values  at  corresponding 
points  are  multiplied  together.  This  yields  the  value  of  p(x)'q(x)  at  2n-l  points. 
The  coefficients  of  the  product  polynomial  are  obtained  via  interpolation  since 
there  is  a  unique  polynomial  with  2n-l  coefficients  which  fits  the  points. 
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Figure  13>  A  ctnqplex  nuniber  a*bi  can  be  represented  by  a  vector  in  a  plane  using 
the  real  and  imaginary  pcurts  of  the  number  for  Cartesian  coordinates.  Alternatively 
a  number  can  be  represented  on  the  same  grid  by  giving  its  polar  coordinates  (r,6) 
where  and  0«tan”^  b/a.  The  polynomial  -1  has  n  roots,  ceLLled  the  n-th 

principal  roots  of  unity.  Geometrically,  the  vectors  representing  these  numbers 
slice  the  unit  circle  into  n  equal  pie-shaped  pieces.  Hie  polar  coordinates  of 
the  fifth  roots  of  unity  are  shown. 
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Figure  lU.  The  usual  method  for  multiplyiog  tvo  2x2  matrices  involves  8 
multiplications  and  U  additions.  In  1969  Volker  Strassen  showed  how  the  number 
of  multiplications  could  be  reduced  to  7  bjr  ueing  18  additions/subtractions. 
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ALGORITHMIC  COMPLEXITY 
Part  4 


by 

Lyle  A.  Anderson 


SYSTEMATIC  ANALYSIS  OF  ALGORITHMS 

ABSTRACT 


The  limits  and  methods  involved  in  the  systematic  analysis 
of  algorithms  are  explored.  A  review  of  the  existing  work 
in  this  field  is  presented.  A  specific  method  of  systematic 
analysis  is  developed.  The  method  consists  of  (1)  the 
translation  of  algorithm  loop  structures  into  recursive 
subroutines  and  recursive  subroutine  references,  and  (2)  the 
semantic  manipulation  of  expressions  representing  the  joint 
probability  distribution  function  of  the  program  variables. 
A  new  delta  function  is  introduced  to  describe  the  effects 
of  conditional  statements  on  the  joint  probability  density 
function  of  the  program  variables.  The  method  is  applied  to 
several  simple  algorithms,  sorting  and  searching  algorithms, 
and  a  tree  insertion/deletion  algorithm. 
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CHAPTER  1 

INTRODUCTION 

This  chapter  is  divided  into  two  parts.  In  the  first 
part  we  will  state  and  discuss  the  problem  in  computer 
science  that  will  be  addressed  in  the  rest  of  the  thesis. 
In  the  second  part  we  will  give  an  overview  of  the  remaining 
chapters  of  the  thesis. 

Statement  of  the  Problem 

This  thesis  is  concerned  with  the  systematic  analysis 
of  algorithms.  In  order  to  understand  what  it  is  about,  we 
must  answer  these  three  questions: 

1.  What  are  algorithms? 

2.  What  is  the  analysis  of  algorithms? 

3.  What  is  the  systematic  analysis  of  algorithms? 

We  will  also  be  discussing  a  fourth  question: 

4.  What  are  the  limits  of  systematic  analysis? 

This  will  Involve  a  short  discussion  of: 

N 

a.  Godel's  Theorem 

b.  The  Halting  Problem 

c.  Characteristics  of  the  Completeness  Problem 


what  are  Algorithms? 

Horowitz  and  Sahni  [7]  give  this  definition  of  an 
algorithm:  "Algorithm  has  come  to  refer  to  a  precise  method 
useable  by  a  computer  for  the  solution  of  a  problem."  In 
order  to  be  considered  an  algorithm  the  method  must  have  the 
following  characteristics: 

1.  A  finite  number  of  steps  of  one  or  more  operations 

2.  Each  operation  must  be  definite,  i.e,  unambigously 
defined  as  to  what  must  be  done 

3.  Each  operation  must  be  effective,  i.e.  a  person  with 
pencil  and  paper  or  a  Turing  Machine  must  be  able  to 
perform  each  operation  in  a  finite  amount  of  time 

4.  Produce  at  least  one  output 

5.  Accept  zero  or  more  inputs 

6.  Terminate  after  a  finite  number  of  operations 

What  is  the  Analysis  of  Algorithms? 

Webster's  New  Collegiate  Dictionary  defines  analysis  as 
"an  examination  of  a  complex,  its  elements,  and  their  rela¬ 
tions".  In  the  analysis  of  an  algorithm  we  are  interested 
in  the  relationship  between  characteristics  of  the  inputs 
and  the  performance  characteristics  of  the  algorithm.  Fore¬ 
most  among  these  characteristics  is  the  execution  time  of 
the  algorithm;  that  is,  the  relationship  between  some  sizing 
parameter  of  the  input  data  and  the  amount  of  time  it  takes 
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for  the  algorithm  to  get  an  answer.  Other  performance 
parameters  of  interest  include: 

1.  Number  of  comparisons  in  sortin^/searching 
algorithms 

2.  Number  of  scalar  multiplications/divisions  in 
algebraic  algorithms,  such  as  matrix-matrix  product 

3.  Number  of  input/output  operations  required  for 
problems  dealing  with  database  access 

4.  Size  of  the  computer  memory  required  to  solve  a 
problem 

All  of  these  performance  parameters  have  one  thing  in 
common.  They  all  can  be  transformed  into  the  cost  of  com¬ 
puting  the  answer.  This  is  the  reason  that  the  analysis  of 
algorithms  is  so  important.  Aside  from  its  intellectual  and 
recreational  aspects,  the  economic  aspects  of  the  analysis 
of  algorithms  are  important  to  the  users  of  computer  sys¬ 
tems.  Especially  in  the  computer-based  industries,  time  is 
money.  An  algorithm  which  takes  twice  as  long  to  run  may 
not  only  cost  twice  as  much  to  run,  but  may  not  even  get 
done  in  time  to  be  useful.  In  other  applications,  accurate 
predictions  of  probable  running  times  are  needed  before  a 
system  is  actually  built.  These  predictions  can  help  make 
overall  cost  and  feasibility  estimates  for  a  proposed  system 
more  accurate.  In  these  kinds  of  applications  the  analysis 
of  algorithms  is  a  software  engineering  tool.  Other  poten¬ 
tial  uses  are  in  automatic  program  synthesizers  or  in 


compiler  systems  for  very  high-level  languages.  [1] 

In  most  cases  the  analysis  of  an  algorithm  consists  of 
determining  the  time  behavior  of  the  algorithm.  This  is  not 
the  only  measure  of  a  program  for  which  an  analysis  can  be 
performed.  An  algorithm  can  be  analyzed  by  "instrumenting" 
it,  meaning  that  the  values  of  the  parameter  of  interest  are 
recorded  in  a  counter  variable  which  is  added  to  the  algo¬ 
rithm.  We  often  do  this  when  analyzing  for  the  time 
behavior  of  an  algorithm.  For  this  reason  the  analysis  of 
different  measures  have  a  great  deal  in  common  with  the  ana¬ 
lysis  of  time  behavibr.  When  we  talk  about  the  analysis  of 
an  algorithm,  we  will  only  be  concerned  with  its  time 
behavior  unless  otherwise  stated. 


What  is  the  Systematic  Analysis  of  Algorithms? 

There  are  two  basic  ways  to  approach  the  analysis  of 
algorithms.  The  first  way  is  to  approach  each  alogrithm  as 
a  separate  new  problem  and  to  find  the  solution  by  appealing 
to  previous  experience  with  similar  problems.  The  second 
way  is  to  make  up  general  rules  which  apply  to  "all" 
algorithms  and  to  apply  these  rules  step  by  step  to  the 
algorithm  being  studied. 

The  first  way  is  very  suitable  to  humans  who  come 
equipped  with  a  great,  deal  of  problem-solving  and  pattern- 
recognition  ability.  It  is  not  so  well  suited  to  the 
digital  computers  of  today  because  they  are  not  so  equipped. 


The  more  systematic  approach  of  the  second  way  to  analyze 
algorithms  is  better  suited  to  implementation  by  digital 
computers.  We  shall  say  that  the  human  approach  involves  ad 
hoc  procedures,  and  the  computer  approach  involves 
systematic  procedures. 


What  are  the  limits  of  Systematic  Analysis? 

The  gross  limits  of  systematic  or  automatic  algorithm 
analysis  are  known. 

1.  We  know  that  systems  can  be  built  which  will  analyze 
simple  programs.  [1,3,4] 

2.  We  know  that  no  completely  automatic  system  or  com¬ 
plete  formal  system  can  be  constructed  which  can 
analyze  all  algorithms.  This  fact  is  firmly  estab¬ 
lished  by  computability  theory.  [15] 

In  between  the  simple  programs  and  all  possible  programs 
there  is  a  lot  of  ground  which  can  be  covered. 


What  We  Can  Do 

Wegbreit  [1]  has  built  a  system  which  can  analyze 
simple  LISP  programs  automatically.  Cohen  and  Zuckerman  [3] 
have  built  a  system  which  greatly  aids  in  the  analysis  of 
algorithms  written  in  an  ALGOL-like  programming  language. 
Their  system  helps  the  analyst  with  the  details  of  the 
analysis  while  requiring  the  analyst  to  provide  the  branch¬ 
ing  probabilities.  Wegbreit  [2]  developed  a  formal  system 
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for  the  verification  of  program  performance.  His  technique 
can  also  be  used  to  provide  the  branching  probabilities 
which  are  needed.  Recently,  Ramshaw  [5]  has  shown  that 
■'ere  are  problems  with  Wegbreit's  probabilistic  approach 
and  has  developed  a  formal  system  which  he  calls  the 
Frequency  System.  There  are  problems  with  the  Frequency 
System,  which  Ramshaw  points  out  in  his  thesis  [5] .  We  will 
show  that  some  of  the  problems  in  the  Frequency  System  can 
be  overcome. 


What  We  Cannot  Do 

Douglas  R.  Hofstadter  [15]  gives  a  beautiful  exposition 
of  the  nature  of  the  whole  question  of  computability  and 
decidability  and  the  wide-ranging  and  unexpected  topics  upon 
which  it  touches.  The  formal  study  of  this  subject  springs 

m 

from  Godel's  Theorem  which  Hofstadter  paraphrases: 

"All  consistent  axiomatic  formulations  of  number 
theory  include  undecidable  propositions.” 

The  undecidability  of  the  Halting  Problem  is  an  example 

of  one  such  "undecidable  proposition.”  Stated  in  terms  of  a 

Turing  Machine,  the  Halting  Problem  is  this; 

Can  one  construct  a  Turing  Machine  which  can  decide 
whether  any  other  Turing  Machine  will  halt  for  any 
input,  when  given  an  input  tape  containing  a 
description  of  the  other  Turing  Machine  and  its 
input? 

A  negative  answer  to  this  question  was  given  in  1937  by 
Alan  Turing.  The  argument  which  he  used  is  called  a  diagonal 
method.  This  method  was  discovered  by  Georg  Cantor,  the 
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founder  of  set  theory.  It  involves  feeding  a  hypothetical 
Turing  Machine,  which  could  decide  whether  any  other  Turing 
Machine  would  halt  for  any  input,  a  description  of  itself 
which  has  been  modified  in  a  particularly  diabolical  manner. 
Hofstadter's  book  [15]  devotes  much  of  its  740  pages  to  the 
variety  of  topics  to  which  this  method  may  be  applied. 

It  appears  to  us  that  undecidability  and  incompleteness 
creep  into  formal  systems  when  statements  which  can  be 
interpreted  as  being  about  the  system  itself  are  allowed. 
In  our  discussions  we  will  try  to  avoid  these  kinds  of 
questions,  and  thereby  the  completeness  problem. 

Overview  of  the  Thesis 

We  have  chosen  to  organize  this  thesis  along  the  lines 
which  were  taken  in  the  development  of  the  research  upon 
which  it  is  based.  We  feel  that  the  road  taken  is  interest¬ 
ing  in  and  of  itself.  For  this  reason  we  will  point  out  the 
"dead-ends"  which  periodically  blocked  our  path. 

The  first  step  which  we  took  was  a  survey  of  the  work 
which  had  been  done  in  this  field.  In  Chapter  2,  we  will 
discuss  the  current  state  of  the  art  of  algorithm  analysis. 
We  will  point  out  the  areas  where  results  are  firmly  estab¬ 
lished  and  the  benefits  of  particular  procedures  that  are 
known.  We  will  examine  some  of  the  recent  advances  both  to 
see  how  they  work  and  to  discover  the  kinds  of  problems 
which  they  cannot  solve. 
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When  this  survey  was  completed  we  formulated  a  plan. 
The  approach  which  we  used  was  to  start  from  the  program 
statements  themselves.  We  attempted  to  determine  just  how 
much  could  be  learned  from  manipulations  of  the  programs 
using  various  translation  schema.  We  restricted  ourselves 
to  programs  written  in  a  "structured"  language.  SPARKS, 
developed  by  Horowitz  and  Sahni  [7,9],  was  chosen  as  the 
language  for  representing  algorithms  for  the  same  reasons 
they  used  it  in  their  books. 

Our  initial  work  revealed  a  transformation  which  proved 
to  be  effective  in  analyzing  several  deterministic  algo¬ 
rithms  in  a  straight-forward  manner.  Chapter  3  describes 
this  technique  which  involves  the  transformation  of  all 
looping  structures  of  a  program  into  a  series  of  recursive 
subroutines  and  recursive  subroutine  calls.  Because  this 
process  is  designed  to  follow  the  syntax  of  the  algorithm, 
we  refer  to  this  as  a  "syntax-directed  translation."  The 
program  characteristic  to  be  analyzed  is  selected,  and  the 
recursive  program  statements  are  transformed  into  recurrence 
equations.  The  analysis  is  done  by  solving  the  recurrence 
equations.  This  is  not  always  easy  [8].  For  this  reason  we 
concerned  ourselves  with  solving  as  well  as  setting  up  the 
recursions. 

In  Chapter  3,  we  will  examine  some  very  simple,  deter¬ 
ministic  algorithms  (i.e.  ones  for  which  we  know  the  inputs 
exactly),  then  some  very  simple  probabilistic  algorithms 
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(i.e.  ones  where  we  only  know  some  characteristics  of  the 
inputs)  .  While  looking  at  these  examples  we  will  discover 
the  "problem  of  the  conditional  statement."  We  started  with 
the  FINDMAX  algorithm  which  was  analyzed  both  by  Knuth  [6] 
and  by  Ramshaw  [5].  We  soon  discovered  that  when  the 
statistical  behiivior  of  algorithms  is  being  analyzed,  the 
distribution  from  which  the  input  data  is  drawn  is  an 
important  factor  in  the  running  time.  While  we  could  solve 
the  problems  relating  to  distributions  in  algorithms  such  as 
FINDMAX,  we  often  found  ourselves  using  information  from 
"outside  the  system". 

Chapter  4  presents  our  formal  approach  for  handling  the 
conditional  statement.  This  approach  is  to  use  statements 
about  the  distributions  of  program  variables  directly  in  the 
analysis  of  the  algorithms.  We  found  that  we  had  to  study 
the  propagation  of  the  distributions  of  the  program  vari¬ 
ables  through  the  program.  As  a  result,  we  developed  a 
"calculus"  for  the  behavior  of  the  distributions  themselves. 
We  will  use  this  method  to  analyze  the  probabilistic 
algorithms  from  Chapter  3. 

We  will  then  move  on  and  apply  the  techniques  to  some 
sorting  and  searching  algorithms  in  Chapter  5,  and  to  a 
miscellaneous  problem  in  Chapter  6.  Chapter  7  is  a  summary 
of  the  work  and  an  outline  of  possible  future  efforts. 

Appendix  A  contains  some  details  of  the  work  discussed 
in  Chapter  5. 
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CHAPTER  2 


CURRENT  STATE  OF  THE  ART 

In  this  chapter,  we  will  discuss  what  is  currently 
known  about  the  analysis  of  algorithms.  The  chapter  is 
divided  into  two  sections.  The  first  discusses  what  we  call 
ad  hoc  procedures,  and  the  second  discusses  current  syste¬ 
matic  approaches. 


Ad  Hoc  Procedures 

We  are  going  to  characterize  an  analysis  technique  as 
"ad  hoc"  if  we  cannot  see  a  way  to  easily  remove  the 
"intuition"  required  to  get  the  answers.  The  analysis  proce¬ 
dures  which  are  so  categorized  are  more  suited  for  use  by 
humans  than  for  the  programming  of  a  computer.  They  take 
advantage  of  the  rich  background  of  experience  which  forms 
the  context  of  a  human's  ability  to  perform  such  analysis. 
We  will  present  the  techniques  of  three  sets  of  researchers 
in  order  of  increasing  mathematical  elegance  of  the  tech¬ 
niques.  A  method  with  a  high  degree  of  elegance  is  very 
hard  for  the  uninitiated  to  understand,  but  facilitates 
quick  and  meaningful  communication  between  the  initiated. 
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de  Freitas  and  Lavelle 


The  most  straight-forward,  and  hence  the  least  elegant, 
way  to  analyze  an  algorithm  is  to  write  down  how  long  each 
statement  takes  and  to  add  up  the  result.  S.  L.  de  FVeitas 
and  P.  J,  Lavelle  describe  "A  Method  for  the  Time  Analysis 
of  Programs"  [4]  which  does  the  first  part  of  this  proce¬ 
dure.  Their  method  consists  of  superimposing  timing  data 
about  the  assembly/machine  code  produced  by  a  FORTRAN 
program  on  the  program  source  listing.  The  programmer  may 
then  use  the  timing  information  to  identify  inefficient 
portions  of  the  program.  The  method  docs  not  calculate  the 
repetition  counts  for  loops,  but  presents  the  time  required 
to  perform  one  iteration  of  a  loop.  It  therefore  requires 
the  application  of  all  the  ad  hoc  analysis  techniques  we 
will  describe,  but  allows  the  analyst  to  come  up  with  exact 
answers  to  time  performance  questions.  Even  though  it  uses 
a  computer  program,  it  can  still  be  considered  an  ad  hoc 
technique . 


Aho,  Hope ro ft  and  Ullman 
Horowitz  and  Sahni 

Aho,  Hoperoft  and  Ullman  (10]  and  Horowitz  and  Sahni 
[7]  describe  a  level  of  analysis  which  is  one  step  removed 
from  the  machine  dependent  technique  described  above.  This 
level  deals  with  the  statements  of  the  algorithm  as  primi¬ 
tive  entities  and  largely  Ignores  the  variation  in  execution 
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time  between  them.  This  type  of  analysis  seeks  order-of- 
iQagnitude  or  "Big  0"  performance  data.  In  their  excellent 
introductory  text  [7] ,  Horowitz  and  Sahni  are  primarily 
interested  in  this  kind  of  analysis.  They  introduce  a 
methodology  which  is  very  close  to  the  high  level  "code"  of 
the  algorithm  to  be  analyzed.  Aho,  Hopcroft  and  Ullman  [10] 
give  an  excellent  presentation  of  the  various  computer  and 
computability  models  which  have  been  used. 

Knuth's  Analysis  Techniques 

It  would  be  unfair  to  imply  that  Knuth's  techniques  are 
all  ad  hoc.  Nothing  can  be  further  from  the  truth.  Donald 
E.  Knuth,  perhaps  more  than  anyone  else,  has  established  the 
definitions  and  direction'^  of  algorithmic  analysis  [6]. 
Jonassen  and  Knuth  present  an  ad  hoc  tour  de  force  in  "A 
Trivial  Algorithm  Whose  Analysis  Isn't"  [8].  In  the  begin¬ 
ning  of  his  book  [6] ,  Knuth  sets  down  the  tools  and  techni¬ 
ques  which  may  be  brought  to  bear  during  the  analysis  of  an 
algorithm.  It  is  this  grouping  of  techniques  which  we  refer 
to  as  "ad  hoc": 

1.  Mathematical  Induction 

2.  Sums  and  Products 

3.  Elementary  Number  Theory  and  Integer  Functions 

4.  Permutations  and  Factorials 

5.  Binomial  Coefficients 

6.  Harmonic  Numbers 

7.  Generating  Functions 

8.  Euler's  Summation  Formula 

9.  Combinatorics 


The  application  of  these  techniques  requires  a  consid¬ 
erable  amount  of  intuition  and  experience  in  the  analysis  of 
algorithms.  The  analyses  which  result  are  characterized  by 


a  high  degree  of  abstraction. 

Systematic  Approaches 

We  now  begin  a  discussion  of  systematic  approaches  to 
the  analysis  of  algorithms.  These  methods  are  characterized 
by  the  exposition  of  a  "theory"  which  is  applied  consis¬ 
tently  in  the  analysis  of  algorithms.  We  will  discuss  three 
manual  approaches  in  order  of  increasing  effectiveness,  and 
then  discuss  two  automatic  analyzers.  The  manual  approaches 
which  we  will  discuss  are: 

1.  Electrical  Network  Analysis 

2.  Wegbreit's  Probability  System 

3.  Ramshaw's  Frequentistic  system 

For  each  one  we  will  cover  the  theoretical  basis  of  the 
system,  describe  how  it  works,  give  an  example,  and  discuss 
the  inherent  weaknesses  and  their  causes. 

Electrical  Network  Analysis 

Knuth  mentions  the  applicability  of  Kirchhoff's  Current 
Law  to  the  analysis  of  algorithms  and  applies  it  quite  often 
[6].  He  also  mentions  that  Kirchhoff's  Voltage  Law  is  not 
applicable  to  the  analysis  of  algorithms.  An  attempt  to 
introduce  Kirchhoff's  Voltage  Law  into  the  analysis  of 
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algorithms  was  proposed  by  Kodres  [13]  and  extended  by 
Davies.  The  following  section  closely  follows  Davies  [14]. 

A  generalization  of  Kirchhoff's  Voltage  and  Current  Laws  is 
applied  to  the  analysis  of  program  or  algorithm  flowcharts 
in  the  following  way; 

1.  the  number  of  executions  of  a  statement  corresponds 
to  the  current  in  an  electrical  circuit 

2.  the  execution  time  of  a  statement  corresponds  to  the 
resistance  of  a  circuit  element 

3.  the  total  time  spent  executing  the  statement 

corresponds  to  the  voltage  across  an  electrical 
circuit  element 

Kirchhoff’s  Current  Law  states  the  the  sum  of  all 
currents  at  any  circuit  node  is  zero.  By  assigning  a  "sign" 
to  the  direction  of  flow  in  the  flowchart,  it  is  easy  to 
show  that  this  is  true  for  the  number  of  executions  in  a 
flowchart.  The  number  of  times  into  any  node  in  the  flow¬ 
chart  is  equal  to  the  number  of  times  out  of  that  node. 

Kirchhoff's  voltage  law  states  that  the  sum  of  all  voltage 
drops  and  emf's  around  any  circuit  loop  is  zero.  The 

analogy  for  the  voltage  law  breaks  down  in  the  case  of 

parallel  connected  sections  in  a  flowchart.  Here  Kodres  in-  ; 

# 

t, 

troduced  the  idea  of  placing  "current"  sources  in  each  i 

closed  loop  in  the  flowchart.  The  value  of  the  current 
source  is  equivalent  to  the  number  of  times  the  loop  is 

executed. 
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In  the  examples  which  follow,  this  notation  applies: 
is  the  fractional  execution  count  for  the  true  (t) 
branch  of  an  if  statement 

T  is  a  prefix  that  indicates  that  the  quantity  is  an 
execution  time  for  a  program  block  or  element 
(Examples  are  TA,  TC^) 

n  is  the  number  of  executions  of  a  loop  body 

The  expressions  which  are  given  with  each  program 
construct  represent  the  equivalent  "voltage"  or  total 
execution  time  of  the  block  in  question. 

The  structured  programming  constructs  involving  closed 
flowchart  loops  are  translated  as  follows: 

•  if-then-else  is  equivalent  | 


to  a  single 
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•  do-while  is  equivalent  to 
a  single  statement  block 
with  a  value  of 


( 

t 

( 


•  do-until  is  equivalent  to  I 

a  single  statement  block  I 


The  limit  of  this  approach  is  clear  and  has  been 
pointed  out  by  all  who  have  written  about  the  technique. 
The  difficult  part  of  the  analysis  of  algorithms  is  the 
determination  of  the  number  of  times  a  loop  is  executed  or 
in  this  analog,  the  value  of  the  current  source.  However, 
if  one  could  solve  this  problem,  then  this  technique 
guarantees  that  one  can  get  the  solution  to  any  structured 
flowchart . 


Wegbreit's  Probability  System 
Wegbreit's  systematic  approach  to  the  analysis  of 
algorithms  was  introduced  in  an  article  on  "Verifying 
Program  Performance"  [2].  The  analysis  of  the  algorithm  is  a 
natural  by-product  of  proving  that  the  program/algorithm  is 
correct,  and  a  refinement  of  the  use  of  well-ordered  sets, 
first  suggested  by  Floyd.  The  algorithm  is  instrumented  to 
record  the  desired  performance  parameter.  Then  the  appro¬ 
priate  probabilistic  input  assertions  are  made  about  vari¬ 
able  probability  distributions  and  inductive  assertions  are 
shown  to  hold  at  intermediate  stages  in  the  algorithm.  When 
one  of  the  inductive  assertions  can  be  shown  to  be  a  loop 
invariant  it  can  be  manipulated  into  a  statement  about  the 
algorithm's  performance.  The  important  advance  of 
Wegbreit's  probability  system  is  that  it  sets  out  to 
calculate  the  branching  probabilities  in  order  to  determine 
average  computation  time. 
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Ramshaw  [5]  states  that  this  method  is  based  on  the 
ideas  o£  Floyd  and  Hoare.  It  uses  formal  reasoning  about 
predicates  of  the  form  Pr(P)  »  e,  0^e<l.  Which  means  that 
the  probability  that  the  predicate  P  is  true  is  equal  to  the 
real-valued  expression  e.  Ramshaw  has  shown  [5]  that  systems 
of  this  form  have  problems  with  a  very  simple  program  which 
he  calls  the  Leapfrog  Problem: 

Leapfrog:  if  K  =  0  then  K  K  +  2  endif 
We  assume  that  K  can  take  on  the  values  of  1  and  0  with 
equal  probability,  i.e., 

[Pr(K=0)=5]  A  [Pr(K=l)=5l 

The  output  assertion  which  one  would  expect  to  get  is: 

[Pr(K=l)*§]  A  [Pr(K=2)4] 

However,  all  that  can  be  asserted  using  a  Floyd-Hoare  system 
is : 

Pr([K=l]  V  [K=2])  =  1 

This  is  not  particularly  informative  or  of  much  use  in 
subsequent  portions  of  the  program  since  all  of  the 
information  about  the  distribution  of  the  input  has  been 
lost . 


Ramshaw' s  Frequentistic  System 
In  his  Ph.D.  dissertation,  Ramshaw  [5]  reformulates  the 
ideas  about  probabilistic  assertions  into  what  he  calls 
"frequentistic"  assertions.  In  this  way  he  "avoids  the 
rescalings  that  are  associated  with  taking  conditional 
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probabilities."  Ramshaw's  frequency  "is  like  probability  in 
every  way  except  that  it  doesn't  always  have  to  add  up  to 
one."  He  defines  a  f requentistic  state  as  a  collection  of 
deterministic  states  with  their  associated  frequencies. 
Atomic  assertions  are  statements  of  the  form  Fr(P)=e,  where 
P  is  a  predicate  and  e  is  a  real-valued  expression. 

Ramshaw  applies  his  frequency  system  successfully  to 
the  Leapfrog  problem. 

Leapfrog:  if  K  =  0  then  K  4-  K  +  2  endif 
His  input  assertion  is: 

[Fr(K=0)=5]  [Fr(K=l)=5] 

This  means  that  the  frequency  associated  with  the  state  K=0 
is  ^  and  the  frequency  associated  with  the  state  K»1  is  also 

The  total  frequency  associated  with  the  variable  K  is 

i+i  =  1 

So  far  we  have  followed  Ramshaw's  thesis  closely.  The 
following  is  a  slightly  different  interpretation  of  the 
application  of  his  method  which  arrives  at  the  same  answer. 
We  present  it  here  in  this  way  because  it  seems  a  little 
more  formal  than  his  presentation. 

The  if-test  on  the  predicate  {  K=0  }  conjpins  the 
branch  atomic  assertion  [  Fr(K?<0)  =  0  ]  to  the  TRUE 

out-branch.  This  is  derived  by  setting  the  frequency  of  the 
negation  of  the  if-test  predicate  equal  to  zero.  For  the 
FALSE  out-branch,  the  branch  atomic  assertion  is  tFr(K«0)  * 
0] .  This  simply  states  that  the  frequency  with  which  the 
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if-test  predicate  is  true  in  the  FALSE  out-branch  is  zero! 

Each  atomic  assertion  in  the  input  assertion  is 
individually  resolved  with  the  branch  atomic  assertion,  in 
the  manner  of  theorem  proving  systems.  If  there  is  a 
contradiction,  then  that  conjunct  of  the  input  assertion  is 
dropped.  In  the  TRUE  branch  we  have: 

[Fr(K=0)=^]  A  [Fr(K?<0)=0] 
which  is  logically  consistent,  but 

[Fr(K=l)=5l  A  [Fr(K?«0)=0] 

is  a  contradiction  and  is  dropped.  In  the  FALSE  branch  we 
have ; 

[Fr(K=0)=|]  A  [Fr(K-0)=0] 
which  is  a  contradiction,  and 

tFr(K»l)=i]  A  [Fr(K=0)=0]  =  [Fr(K»l)*|]  A  tFr(K?<l)=0] 
which  is  a  valid  assertion. 

In  the  TRUE  branch,  the  assignment  statement  changes 
the  deterministic  states  of  K  to  have  the  value  K+2. 

[Fr(K=2)=|]  A  [Fr(K?<2)=0] 

The  assignment  statement  maps  all  of  the  frequencies  of 
the  states  of  K  in  this  branch  into  the  frequency  of  the 
state  K+2. 

At  the  final  join,  the  output  assertion  is  the 
conjunction  of  the  two  branch  assertions,  namely: 

[Fr(K=2)=5]  [Fr(K?<2)=0]  A  [Fr(K«l)»5]  A  tFr(K?<l)-0] 

This  statement  contains  the  logical  contradiction: 

[Fr(K?<l)»0]  A  lFr{K/2)-0] 
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Unlike  the  case  with  the  restriction  at  the  if-test,  a 
contradiction  at  the  join  (which  must-be  between  atomic 
assertions  from  separate  out-branches)  is  resolved  by 
conjoining  each  branch's  contribution  to  a  given 
f requentistic  state  within  a  single  predicate.  In  this 
case : 

[Fr(K7«l)=0]  A  [Fr(K?<2)=0]  ==>  [Fr{Kj^l  A  K/2)=0]. 

We  arrive  at  Ramshaw's  output  assertion: 

[Fr(K=l)=^]  A  [Fr{K=2)=|]  A  [Fr(K?<l  A  K7<2)=0]. 

This  result  is  a  little  more  useful!  It  says  that  K  is 
either  1  or  2  and  that  it  takes  on  either  value  with  equal 
probability. 

Now,  one  would  think  that  all  this  would  lead  to  a  very 
powerful  method.  It  does.  Ramshaw  shows  how  to  apply  this 
straight  forward  approach  to  the  COINFLIP  algorithm  in 
Chapter  5  of  his  thesis  [5]  .  His  analysis  is  very  similar 
to  the  one  that  we  will  give  in  Chapter  4.  But,  instead  of 
continuing  to  use  the  more  straight-forward  approach, 
Ramshaw  follows  Kozen's  semantics  for  probabilisitic 
programs,  applies  measure  theory,  and  shifts  to  a  "theorem- 
proving"  approach.  He  uses  the  following  rule  of 
consequence  to  prove  theorems  about  the  conditional 
statement : 

|-[A|P]S[B1.  l-CAl-^PlTrC] 

I— [A] if  P  then  S  else  T  fi[B+C] 

This  rule  of  consequence  means  that,  if  the  truth  of 
predicate  A  given  that  P  is  true  implies  that  B  is  true 
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after  the  execution  of  program  section  S,  and  if  the  truth 
of  predicate  A  given  that  P  is  false  implies  the  truth  of 
predicate  C  after  the  execution  of  program  section  T,  then 
if  A  is  true  before  the  if  statement  involving  P,  S,  and  T, 
then  it  follows  that  either  B  or  C  is  true  afterward. 

Ramshaw's  frequency  system  can  handle  some  of  the 
programs  which  Wegbreit’s  can't,  because  Ramshaw  avoids  pro¬ 
blems  of  renormalizing  probabilities.  But  because  Ramshaw 
chose  to  use  this  rule  of  consequence  for  the  if  statement, 
his  system  still  can't  handle  the  "useless  test"; 

if  R  then  nothing  else  nothing  endif. 

Ramshaw  must  include  a  special  rule  of  consequence  for 
the  "useless  test"  (one  that  says  that  nothing  happens) . 
This  seems  to  be  symptomatic  of  those  formal  systems  of 
algorithm  analysis  which  have  grown  from  the  work  in  program 
verification  based  on  theorem  proving. 

We  have  just  given  a  taste  of  Ramshaw's  frequency 
system.  Readers  who  are  interested  in  learning  more  about 
it  should  see  Ramshaw's  dissertation  [5] . 

Automatic  Analyzers 

We  now  turn  our  attention  to  the  current  state  of 
automatic  analysis.  We  will  look  at  two  systems  which  have 
been  reported  in  the  literature. 
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Wegbreit’s  METRIC 


METRIC  [1]  is  a  system,  written  in  Interlisp,  which  is 
able  to  analyze  simple  LISP  programs  and  produce  closed-form 
expressions  for  the  parameter  of  interest  in  terms  of  the 
size  (in  some  sense)  of  the  input.  The  analysis  of  a 
program  takes  place  in  three  distinct  phases: 

1.  Assign  a  cost  to  each  primitive  operation.  This 

process  continues  as  long  as  the  procedure  is  not 
recursive.  Blocks  of  primitive  operations  are 
assigned  the  cost  of  the  sum  of  their  individual 
costs . 

2.  Analyze  the  recursive  procedures.  This  phase  ana¬ 
lyzes  how  the  recursion  variables  change  from  one 
iteration  to  the  next.  A  series  of  difference  equa¬ 
tions  is  generated  by  projecting  this  recursive 
structure  onto  the  set  of  integers. 

3.  Solve  the  difference  equations.  This  phase  finds  a 
closed-form  expression  for  the  difference  equations. 
Wegbreit  has  implemented  solutions  to  these  equa¬ 
tions  based  on:  direct  summation,  pattern  matching, 
elimination  of  variables,  best— case/worst-case  anal¬ 
ysis,  and  differentiation  of  generating  functions. 

In  Wegbreit's  processing  of  conditional  statements,  he 
assumes  that  all  tests  are  independent.  This  is  perhaps  the 
most  serious  flaw  in  the  approach.  Again  the  problem  stems 
from  the  difficulty  in  handling  conditional  probabilities. 
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Cohen  and  Zuckerman* s  EL/PL 


Evaluation  Language/Programming  Language  [31  is  a 
system  that  consists  of  an  ALGOL-like  language  for  express¬ 
ing  algorithms  (PL)  and  a  language  for  analyzing  the  result¬ 
ing  algorithms  (EL) .  The  PL  statements  are  compiled  by  the 
PL  compiler  into  a  symbolic  formula  representing  the  time 
for  executing  the  program.  This  "object  deck"  is  present  to 
the  EL  processor.  The  EL  processor,  in  turn,  provides  a 
human  operator  with  the  means  to  manipulate  the  symbolic 
formula  into  answers.  EL  runs  in  an  interactive  mode.  It 
allows  the  operator  to  bind  formal  or  numerical  values  to 
the  execution  counts  of  loops  and  to  assign  formal  or  numer¬ 
ical  values  to  the  probabilities  of  boolean  expressions. 

Here,  as  with  METRIC,  the  operator  has  to  provide  the 
critical  data  on  the  branching  probabilities.  The  branching 
probabilities  of  different  conditional  statements  are 
assumed  to  be  independent  of  each  other.  This  seems  to  be 
the  most  serious  defect  in  the  automatic  analyzers  to  date. 
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CHAPTER  3 


SYNTAX  DIRECTED  TRANSLATION  APPROACH 

In  this  chapter,  we  will  discuss  our  approach  to  the 
systematic  analysis  of  algorithms.  The  presentation  follows 
the  order  in  which  the  work  actually  progressed.  Our 
research  was  sparked  by  the  arrival  of  Ramshaw's  thesis  [5] . 
It  seemed  to  us,  at  the  time,  that  the  theorem-proving 
approach  was  overly  mathematical.  There  must  be,  we  said,  a 
way  to  look  at  this  which  is  more  closely  related  to  the 
code  and  more  understandable  by  programmers.  Wegbr&it's 
article  on  METRIC  [1]  got  us  thinking  about  the  utility  of 
translating  program  loops  into  recursive  subroutines. 

Loops  make  the  analysis  of  algorithms  interesting. 
Without  loops  it's  once  through  and  done.  Straight  line 
code  is  easy  to  analyze.  When  you  add  some  branching  state¬ 
ments  it  gets  a  little  harder;  but  it's  the  loops  which  make 
an  analysis  really  interesting.  The  first  observation  is 
that  there  has  been  a  lot  of  work  done  on  solving  recurrence 
relations.  If  we  can  convert  all  of  the  different  loop 
structures  to  recursive  subroutine  calls,  then  we  can  apply 
the  same  techniques  to  attempt  to  analyze  all  kinds  of 
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loops.  In  fact,  one  can  do  exactly  that,  as  Wegbreit  [1] 
points  out.  He  also  points  out  that  if  there  are  no 
conditional  branches  in  the  loops,  then  there  is  an  exact 
solution  to  the  recurrence  relations.  Our  procedure  is 
basically  quite  simple: 

1.  Convert  all  loops  into  recursive  subroutine  calls 

2.  Convert  the  recursive  subroutine  calls  into 
recurrence  relations 

3.  Solve  the  recurrence  relations 

Solving  Recurrence  Relations 

There  are  three  basic  methods  for  solving  recurrence 
relations: 

1.  Inspect  the  relation  to  see  if  you  have  seen  it 
before  in  another  problem,  or  recognize  a  general 
form 

2.  Try  a  few  iterations  to  get  the  feel  of  the  recur¬ 
rence  relationships  and  the  way  the  relations 
behave,  then  guess  a  closed-form  answer,  and  prove 
its  correctness  by  induction 

3.  Apply  one  of  the  standard  techniques  to  solve  the 
recurrence  relation 

Within  these  simple  steps  are  contained  a  lot  of  art 
and  experience.  G.  S.  Lueker  in  a  recent  tutorial  "Some 
Techniques  for  Solving  Recurrences"  [16]  gives  an  excellent 
introduction  to  these  methods.  Advanced  techniques  can  be 
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found  in  Knuth  [6],  and  especially  Jonassen  and  Knuth  [8]. 

We  shall  list  some  of  the  techniques  mentioned  by 
Lueker  I 16] . 

1.  Summing  factors  —  where  one  tries  to  manipulate  the 

recurrence  relations  by  addition  of  expressions  for 

adjacent  terms  in  the  hope  that  the  sum  will 

"telescope"  into  a  few  terms,  one  of  which  is  the 
th  . 

n  term. 

2.  Characteristic  equations  —  where  the  problem  is 
mapped  into  that  of  finding  the  roots  of  a 
characteristic  system  of  polynomial  equations.  This 
approach  works  for  linear  recurrences  with  constant 
coefficients. 

3.  Range  transformation  --  where  the  unknown  coeffic- 
ents  in  the  recurrence  relations  are  transformed  by 
some  function  which  turns  an  unknown  problem  into  a 
known  problem,  or  one  that  can  be  solved  by  another 
technique . 

4.  Domain  transformation  —  where  the  index  value  is 
transformed  to  make  the  progression  of  values 
additive  instead  of  some  other  function.  Once  this 
is  done,  summing  factors  can  often  be  used. 

5.  Generating  functions  —  where  the  problem  is 
transformed  into  another  domain  in  a  way  similar  to 
the  transformation  of  a  time-domain  function  into  a 
frequency-domain  function  by  a  Fourier  transform. 
This  method  is  particularly  powerful  for  handling 
probabilistic  aspects  of  solutions. 

Our  work  in  this  thesis,  involved  some  very  familiar 
recurrences  for  which  the  answers  were  easily  guessed. 


Translating  Loops  into  Recursive  Subroutines 
We  will  limit  our  discussion  to  algorithms  expressed 
using  structured  programming  constructs  only.  This  is  not  a 
particularly  restrictive  limitation  since  the  structured 
programming  constructs  are  all  that  is  theoretically  needed 
to  describe  any  alogrithm.  For  this  reason  and  the  fact 
t^at  such  programs  are  easier  to  maintain,  most  new 
programming  is  being  done  using  structured  programming 
methods . 

We  will  adopt  SPARKS  as  the  language  for  expressing 

algorithms.  SPARKS  was  developed  by  Horowitz  and  Sahni  in 

1976  (9]  and  sightly  modified  in  1978  [7]. 

We  have  developed  a  formal  syntax-directed  translation 

schema  for  converting  structured  loop  constructs  into 

recursive  subroutines. 

Given  the  input  syntax  of  the  FOR  loop: 

<label>:  for  <var>  <-  <expj^>  to  <exp2>  by  <exp2>  do 

<statements  with  live  variables> 
repeat 

we  get  the  recursive  syntax: 

start  <-  <expj^>;  stop  <-  <exp2>;  Incr  <- 
<var>  start 

call  <label> (<var> , incr , stop, {  live  variables  }  ) 
procedure  <label>(var, incr, stop, {  live  variables  }) 

If  SGN(incr)  *  (  stop  -  var  )  >  0  then 
<statements  with  live  variables> 
var  4-  var  +  inc 

call  <label>(var, incr, stop, {  live  variables  )  ) 
end  if 

end  <label> 
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The  live  variables  from  <statements>  are  those 
variables  which  are  used  or  created  in  <statements>  and  have 
a  scope  that  extends  outside  of  <statements> . 

The  procedure  for  converting  DO  WHILE  loops  to 
recursive  subroutine  calls  is  quite  similar. 

<label>:  while  <  relational  expression  >  do 
<  statements  with  live  variables  > 
repeat 

The  recursive  syntax  is: 

call  <label>(  {live  variabl's,  relational  variables}  ) 
procedure  <label>  ({live  variables,  relational  variables}) 
if  <  relational  expression  >  then 

<  statements  with  live  variables  > 
call  <label>  (  {  live  variables, 

relational  variables  }  ) 

end  if 
end  <label> 


Simple  Examples 

do  while  example  (Algorithm  for  n") 

The  following  algorithm  is  a  modification  of  one  by 

Horowitz  and  Sahni  [10]. 

procedure  N^to_the__N 
read  R1 

R2  1;  R3  <-  R1 
Tl:  while  R3  >  0  do 

R2  ■4-  R2  *  Rl;  R3  R3  -  1 
repeat 
print  R2 
end  N  to  the  N 

MS  MS 
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This  procedure  contains  a  single  while  loop  which  we 

wish  to  analyze.  The  time  behavior  of  this  algorithm  is 

dominated  by  the  number  of  times  that  the  body  of  the  while 

loop  is  executed.  We  first  translate  the  while  loop  into  a 

recursive  subroutine.  The  algorithm  becomes: 

procedure  N_to_the_N 
read  R1 

R2  <-  1;  R3  <-  R1 
call  Tl(  Rl,  R2,  R3  ) 
print  R2 
end  K__to__the_^N 
procedure  T1  (  Rl,  R2,  R3  ) 
if  R3  >  0  then 

R2  R2  *  Rl;  R3  R3  -  1 
call  Tl(  Rl,  R2,  R3  ) 
end  if 
end  T1 

Only  program  variable  R3  has  any  effect  on  the  course 
of  the  recursion.  Let  i  be  the  mathematical  variable  which 
corresponds  to  R3,  and  T  be  the  number  of  calls  on  the 
subroutine.  Then: 

T  1,  if  i  <  0 

T(i)  =  I 

i  1  +  T(i-l),  if  i  >  0 

The  subroutine  T1  is  called  from  the  main  program  with 
i  *  Rl.  Therefore,  the  recursion  is  solved  by; 

0 

Tl(Rl)  a  5  1  =  Rl  +  1 

jaRl 

The  subroutine  T1  is  called  one  time  more  than  the  value  of 
Rl,  which  we  expected. 
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ODD/EVEN  Print  Example 

This  example  is  a  little  more  difficult.  It  involves 

an  if  statement,  but  one  which  is  completely  determined  by 

the  starting  number.  ODO(I)  is  a  built-in  function  which 

returns  True  if  its  argument  is  odd,  and  False  if  the 

argument  is  even. 

procedure  ODD_EVEN  (  N  ) 

I  «-  N 
while  I  2  1 
Ta:  print  'AAA' 

if  ODD(  I  )  then 
I  I  -  3 

else 

I  I  +  1 

end  if 
repeat 

end  ODD^EVEN 

The  recursive  form  of  the  program  is: 
procedure  ODD_EVEN  (  N  ) 

I  N 

call  Tad) 
end  ODD^EVEN 
procedure  Ta  (  I  ) 
if  I  >  1  then 

print  'AAA' 
if  ODD(  1  )  then 
14-1-3 

else 

I  4-  I  +  1 
end  if 
call  Ta(l) 
end  if 
end  Ta 
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Wegbreit  [1]  points  out  the  idea  for  the  next  step  and 
goes  into  it  in  greater  detail  than  we  shall  here.  He 
states,  “The  essential  idea  is  to  map  a  recursive  procedure 
P  into  a  new  recursive  procedure  whose  value  is  the  cost  of 
P."  We  are  interested  in  the  number  of  times  that  AAA  is 
printed.  The  recurrence  relation  for  it  is  given  by; 

3  ) ,  if  i  is  odd 
1  ) ,  if  i  is  even 
i  is  odd,  we  have: 

(assuming  i^-  3  1) 

3  +  1)  *  2  +  T^Ci^-  2) 

i  is  even: 
o 

1  -  3)  =  2  +  T3(i^-  2) 

Since  the  recursions  for  the  odd  and  even  cases  have  been 
transformed  to  eliminate  the  dependence  on  parity,  we  have 
the  new  recurrence  lelations: 

TgCi)  =  2  +  T3(i-2),  if  i>2 
»  1 

T, (0)  ■  0 

Whose  solution  is  easily  shown  to  be  Tg(i)  =  i. 


1  0,  if  i  <  1 

T^(l)  -  I  1  -f  T^(  I  - 
i  1  +  Tg{  i  + 
Starting  with  the  case  where 

Now,  i^j-S  is  even  so  we  have 
Ta(io)  =  1  +  1  +  Ta(io- 

Note  that  i  -  2  is  also  odd. 
o 

We  now  examine  the  case  when 

Now,  ig+1  is  odd,  so  we  have 

Ta{ie)  =  1  +  i  +  T3(i3+ 
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COINFLIP 


COINFLIP  is  an  algorithm  which  Hamshaw  [5]  uses.  Here 

we  translate  it  into  SPARKS.  The  built-in  function  RANDOM^^^ 

returns  a  value  of  Heads  or  Tails  with  equal  probability. 

procedure  COINFLIP 
I  0 

while  RANDOM^j.  =  T  do 
Tc:  print  'ok,  so  far!';  I  I  +  1 

repeat 

print  I ,  '  times ! • ' 
end  COINFLIP 


The  recursive  version  is: 

procedure.  COINFLIP 

I  0 

call  Tc(I) 

print  I, '  times! ! ' 

end  COINFLIP 

procedure  Tc (  I  ) 

if  RANDOM.  ^  =  T  then 
ht 

print  'ok,  so  far!';  I  <-  I  +  1 
call  Tc (  I  ) 
end  if 
end  Tc 


The  question  "how  many  times  will  tails  turn  up  in  succes- 

siori?"  is  equivalent  to  asking  how  many  times  will  'ok,  so 

far!'  be  printed  out.  We  see  that: 

T  0,  if  RANDOM.  .  =  H 
T  (i)  »  I 

i  1  +  T^,(i+1)  ,  if  RANDO^^^  »  T 

where  T^  is  the  number  of  times  that  the  statement  labeled 
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I 

i 


I 


Tc  in  the  original  program  is  executed.  If  RANDOMj^^  returns 
H  the  first  time  that  it  is  called,  then  the  statement  is 
never  executed.  If  RANDOMj^^  always  returns  T,  then  the 
program  does  not  terminate.  The  in-between  cases  are  the 
interesting  ones.  What  is  the  expected  value  of  i,  i.e.  the 
expected  number  of  times  that  'ok,  so  far*  is  printed?  To 
answer  this  question  requires  an  investigation  of  the  part 
that  probability  plays  in  the  conditional  statement.  We 
will  come  back  to  this  question  later. 


FINDMAX 

This  algorithm  has  been  used  as  an  example  by  several 
authors  (5,  6,  and  7].  It  is  the  usual  algorithm  for 
finding  the  maximum  value  of  a  set  of  numbers.  This  is  the 
first  example  which  we  have  given  in  which  the  recursive 
form  of  the  algorithm  is  not  obvious.  For  this  reason  we 
will  give  the  translation  explicitly. 

procedure  FINDMAX (  A,  N,  XMAX  ) 

/*  set  XMAX  to  the  maximum  value  in  A(1:N),  N>0.  */ 

XMAX  <-  A(l) 

LI:  for  I  2  to  N  do 

if  A(I)  >  XMAX  then  XMAX  A(I);  end  if 
repeat 
end  FINDMAX 
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The  recursive  version  of  this  program  is; 
procedure  FINDMAX(  h,  N,  XMAX  ) 

/*  set  XMAX  to  the  maximum  value  in  A(1:N),  N>0.  */ 

XMAX  «-  A(l) ;  I  2 
call  Ll(  A,  N,  I,  XMAX  ) 
end  FIMDMAX 

procedure  LI (  A,  N,  I,  XMAX) 
if  I  <  N  then 

if  A(l)  >  XMAX  then 
Tl;  XMAX  4-  A (I);  end  if 

I  I  +  1 

call  LI (  A,  N,  I,  XMAX) 
end  if 
end  LI 

The  next  step  is  to  convert  the  recursive  algorithm 
into  a  recurrence  relation  for  the  number  of  times  that 


control  passes  Ti.  In  this  case  we  are  interested  in  the 

number  of  times  that  a  new  maximum  is  found. 

T  1  +  T(A,n, i+1 ,A ( i ) )  if  A(i)>xmax 
T(  A,  n,  i,  xmax)  =  I 

i  0  +  T(A,n, i-t-l ,xmax)  if  A(i)£  xmax 

with  the  boundary  condition  T(  A,  n,  k,  xmax)  =  0  for  k>n. 

Given  a  known  input  array,  A(l:n) ,  this  recurrence 
relation  completely  determines  the  value  of  T.  If  this  were 
all  that  could  be  learned,  then  it  would  not  be  very  useful. 
The  answer  could  just  as  well  be  determined  by  instrumenting 
the  original  algorithm  with  a  test  counter  in  the  true 
branch.  In  this  case  we  observe  that  the  true  branch  is 
taken  if  the  i-th  element  is  the  largest  of  the  first  i 
elements.  If  p^  is  the  probability  that  A(i)  is  the  largest 


of  i  elements  we  have: 


T(A,i)  =  Pi  +  T(A,i+l)' 

as  a  description  of  the  average  behavior  of  the  algorithm. 
At  this  point  we  have  dropped  the  arguments  of  T  which 
return  the  "answer"  so  that  we  can  concentrate  on  the  time 
behavior . 


If  the  elements  A(i)  are  drawn  from  a  uniform 
distribution,  then  Pi  =  j  and 

T(A,i)  =  j  +  T(A,i+l) 

T(A,i)  =  0,  for  i>n 

Since  the  initial  value  of  i  is  2,  the  solution  to  this 
recursion  is  easily  shown  to  be  T{A,2)  **  -  1,  where  is 

the  n*'*'  harmonic  number: 


"n  -  r  *  I  *  5  +  ••••  *  R 

While  we  were  able  to  get  the  correct  solution,  this 
way  of  analyzing  the  algorithm  is  not  suited  for  automation. 
The  insight  into  the  distribution  of  the  data  and  its  effect 
on  the  probability  that  the  branch  would  be  taken  requires 
human-like  understanding. 


The  Problem  of  the  Conditional  Statement 
At  this  point,  our  approach  has  the  same  problem  that 
plagues  the  Electrical  Network  approach — it  works  fine  if 
one  knows  the  branching  probabilities.  It  was  at  this  point 
in  our  research  that  we  went  back  and  studied  the  work  of 
Wegbreit  and  Ramshaw  more  closely.  We  noted  the  strengths 


and  weaknesses  which  we  described  in  Chapter  2.  Knuth  [5] 
provides  an  analysis  of  FINDMAX  which  relies  on  some  subtle 
reasoning  about  left-to-right  maxima  among  random  permuta¬ 
tions.  Since  we  plan  to  teach  a  computer  how  to  do  this 
analysis,  we  wanted  to  keep  any  real  "thinking"  out  of  it 
until  absolutely  necessary.  In  Wegbreit's  and  Ramshaw's  ap¬ 
proaches,  the  fact  that  the  program  variables  of  interest 
are  random  variables  and  have  distributions  is  recognized. 
However,  most  of  their  analyses  are  performed  by  making 
assertions  about  the  frequencies  or  probabilities  of  these 
distributions,  and  then  proving  theorems  about  the 
assertions.  The  problem  of  the  "useless  test"  led  us  to 
think  that  it  might  be  useful  to  see  what  happened  when  one 
followed  the  distributions  themselves  around  the  program. 

At  this  point  we  had  been  concentrating  so  much  on 
understanding  the  true  meaning  of  "differentially  disjoint 
vanilla  assertions",  the  measure  theory,  and  theorem  proving 
aspects  of  Ramshaw's  frequency  system  [5],  we  had  forgotten 
that  his  treatment  of  COINFLIP  dealt  with  the  distributions 
themselves.  it  was  only  after  we  had  devised  a  major 
portion  of  our  approach  that  we  realized  the  great  similar¬ 
ity  between  our's  and  Ramshaw's  frequency  system  (as  it 
stood  in  Chapter  5  of  his  thesis  (5]).  We  then  recognized 
that  we  had  continued  down  the  path  of  following  the  dis¬ 
tributions,  while  Ramshaw  had  turned  to  follow  the  path  of 
proving  theorems  about  f requentistic  assertions. 
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CHAPTER  4 


DEALING  WITH  CONDITIONAL  STATEMENTS 

In  this  chapter  we  introduce  the  central  idea  which,  we 
feel,  is  either  a  new  idea  or  one  which  has  been  inadequate¬ 
ly  expressed  in  the  past.  The  problem  with  the  conditional 
statement  stems  from  the  normalizations  required  when  taking 
probabilities,  so  why  not,  we  reasoned,  put  off  taking  the 
probabilities  as  long  as  possible?  Ramshaw's  thesis  [5]  was 
a  key  to  this.  We  observed  his  abandoning  of  his  raw 
frequencies  in  favor  of  asserting  predicates  about  frequen¬ 
cies.  Another  key  factor  in  our  choosing  this  direction  was 
Jonassen  and  Knuth's  paper  on  "A  Trivial  Algorithm  Whose 
Analysis  Isn't"  [8].  Here  were  these  nice  joint  probability 
distribution  functions  (p.d.f.)  which  appeared  from  "direct¬ 
ly  translating  the  algorithm  into  mathematical  formalism.” 
We  set  out  to  find  the  rules  that  had  to  have  been  used  to 
get  to  these  simple  recurrence  relations.  Because  we  took 
so  many  wrong  turns  on  our  way  to  our  final  ideas,  we  will 
abandon  our  historical  presentation  in  favor  of  a  more 
expository  one.  We  also  have  to  abandon  our  initial  assess¬ 
ment  that  Ramshaw's  approach  was  "too  mathematical".  There 
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seems  to  be  no  way  to  avoid  mathematics  if  one  desires  more 
than  the  analysis  of  the  simplest  algorithms. 

Algorithms  and  Probability  Distributions 
Each  execution  of  an  algorithm  can  be  thought  of  as  a 
random  experimental  sample  from  the  universe  of  possible 
input  data.  We  will  be  concerned  with  the  behavior  of  the 
probability  distributions  associated  with  the  program  vari¬ 
ables  during  execution  of  the  algorithm.  These  probability 
distributions  can  be  thought  of  as  the  repository  of  all  the 
information  about  possible  execution  histories  for  an  algo¬ 
rithm.  We  perform  the  analysis  of  an  algorithm's  behavior 
by  manipulating  these  distributions  to  find  probabilites  for 
various  conditions.  We  can  then  use  this  information  in  any 
of  the  analysis  techniques  (e.g.,  those  given  in  Chapters  2 
and  3),  which  work  for  known  branching  probabilities. 

We  begin  by  associating  a  random  variable  with  each 
algorithm  or  program  variable.  We  will  follow  Ramshaw  [5] 
and  differentiate  between  the  two  by  continuing  to  represent 
algorithm  variables  by  upper-case  character  strings  and 
representing  the  corresponding  random  variable  by  the  same 
characters  in  lower-case  letters.  For  example,  the  random 
variable  xmax  is  associated  with  the  program  variable  XMAX. 
The  value  of  the  random  variable  x  at  any  time  in  the 
execution  of  the  algorithm  is  the  value  of  the  corresponding 
algorithm  variable  at  that  time,  unlike  Ramshaw,  we  have  no 
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prohibition  about  mixing  program  and  mathematical  variables 
in  the  same  expression.  In  fact  this  will  be  how  we  get 
some  of  our  answers. 

We  define  the  probability  set  function,  P^iA) ,  to  be 
the  probability  that  the  program  variable  X  is  contained  in 
the  set  of  possible  values  A,  i.e.,  Pj{  ~  Pr(X  6  A)  .  If 
the  set  A  is  countable,  we  obtain  the  discrete  probability 
density  function  (p.d.f.)  ,  fjj{x): 

fjj(x)  =  Pr  (X  G  A)  I  A  =  {  some  finite  set  of  x's  }  (4-la) 
If  we  let  the  set  A  be  the  set  of  all  values  of  {X|x<X^x+dx} 
we  have  the  continuous  probability  density  function,  t 

fjj(x)  =  Pr(X  G  A)  I  A  =  {  X  <  X  <  x+dx  }  (4-lb) 

We  will  deal  with  the  discrete  type  of  random  variable 
in  our  formalism  because  of  the  fact  that  all  values  within 

a  computer  can  be  mapped  onto  a  finite  set  of  integers.  By 

staying  with  discrete  representations,  we  avoid  the  need  for 
the  concept  of  "differential  equality"  which  Ramshaw  [5] 
introduced  to  bridge  the  gap  between  continuous  variables 
and  program  equality  expressions.  We  will  develop  a  nota¬ 
tion  which  is  very  close  to  the  calculus  of  finite  differ¬ 
ences.  Some  of  the  rules  which  we  will  use  will  be  derived 
from  analogous  rules  in  continuous  probability  theory  and 
the  calculus  of  continuous  variables. 

Equations  (4-1)  can  be  generalized  to  any  finite  number 

of  program  variables  by  thinking  of  the  X  as  a  vector  of  the 

n  ordered  program  variables  and  x  as  an  n  dimensional  random 
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vector.  The  random  variables  form  a  vector  space  in  and 
is  a  functional  over  that  space. 

The  joint  p.d.f.  of  the  program  variables  describes  the 
state  of  the  program  up  to  a  point  in  the  execution  of  the 
program.  If  we  have  a  loop  translated  into  a  recursive 
subroutine  call,  and  if  we  can  describe  the  joint  p.d.f. 
before  the  next  recursive  call  in  terms  of  the  joint  p.d.f. 
entering  the  body  of  the  subroutine,  then  we  have  a  recur¬ 
rence  relation  that  we  may  be  able  to  solve  to  get  the  joint 
p.d.f.  as  a  function  of  the  number  of  calls  on  the  subrou¬ 
tine.  This  knowledge  will  allow  us  to  calculate  the  branch¬ 
ing  probabilities  at  any  step  in  the  process  and  hence 
complete  the  analysis  of  the  algorithms  begun  in  Chapter  3. 

Let  us  now  examine  the  behavior  of  the  joint  p.d.f. 
with  various  programming  constructs.  We  begin  with  the 
conditional  statement. 

Theorem  1; 

If  R  is  a  deterministic  logical  relation  of  the  program 
variables  then,  the  conditional  statement 

if  R  then  {  }  else  (  )  endif 

a.  Divides  the  joint  p.d.f.  entering  the  if  statement 
into  two  parts  byj 

1.  setting  to  zero  all  terms  of  the  joint  p.d.f. 

entering  the  then  clause  {  }  for  which  R  is 

FALSE,  and 

2.  setting  to  zero  all  terms  of  the  joint  p.d.f. 

entering  the  else  clause  {  }  for  which  R  is 

TRUE. 
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b.  Forms  the  joint  p.d.f.  leaving  the  endif  from  the 
algebraic  sum  of  the  joint  p.d.f.s  leaving  the  two 
clauses. 

We  will  not  present  a  formal  proof,  but  will  use 
Theorem  1  as  a  rule  and  see  how  it  handles  situations  for 
which  we  have  answers  by  other  means. 

The  effect  of  the  conditional  statement  on  the  joint 
p.d.f.  entering  each  clause  can  be  represented  in  a  compact 
manner  using  a  new  type  of  delta  function  which  we  will 
refer  to  as  the  Boolean  delta.  This  new  delta  function  is 
closely  related  to  the  Kronecker  and  Dirac  delta  functions, 
except  that  its  domain  is  a  Boolean  space  with  possible 
values  True  and  False.  The  Boolean  delta  maps  the  Boolean 
space  into  the  numbers  0  and  1. 

Definition 

Let  R  be  a  deterministic  logical  relation  of  program  vari¬ 
ables,  then  the  Boolean  delta  function 

.  T  1  if  R  is  TRUE 

0(R)  =  I 

J,  0  if  R  is  FALSE. 

It  is  easy  to  see  that  the  following  properties  hold: 

6(R)  •  6(-.R)  =  0 
6(R)  +  6(-,r)  -  1 

6(r)  *  1  -  6(-.r) 

6{r  a  s)  -  6(r)  *  6(s) 

6(r  v  s)  -  6(r)  +  6(s)  -  6(r)  •  6(s) 

with  these  properties  one  can  find  the  Boolean  delta  of 
any  Boolean  expression.  We  can  now  state  a  theorem  about 
the  effects  of  the  "useless  test”  on  the  joint  p.d.f. 
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Theorem  2 


Let  be  the  joint  p.d.f.  of  the  n  program  variables 

X •  »x^  at  a  point  in  an  algorithm  just  prior  to  the 
"useless  test", 

if  R  then  nothing  else  nothing  endif 
where  R  is  a  deterministic  logical  relation  on  the  program 
variables  X,  and  let  joint  p.d.f.  of  the 

program  variables  after  the  join  at  the  endif,  then  gx^*^  “ 
fx(x)  . 

Proof ; 

Using  Theorem  1  and  the  Boolean  delta  5(R)  we  have  the 
augmented  algorithm: 

{  fx(x)  } 

if  R 

then  {  fx(x)  *  6(R)  } 

nothing 

else  {  fjj(x)  *  6(-.R)  } 

nothing 

endif  {  gx(x)  =  fjj(x)6{R)  +  fj^(x)6(-.R)  } 

{  gx(x)  =  fx(x)  *  ( 

{  gx<*)  *  ^ 

Q .  E .  D  • 

This  discussion  of  the  joint  p.d.f.  of  the  program  var¬ 
iables  is  very  close  to  Ramshaw's  [5]  f requentistic  states. 
We  can  show  that  Ramshaw's  f requentistic  assertions  can  be 
derived  from  marginal  or  joint  p.d.f.s.  We  depart  from 
Ramshaw  is  that  we  will  stay  with  the  rules  for  the  trans¬ 
formation  of  the  joint  p.d.f.  by  the  algorithms  instead  of 
moving  to  the  next  higher  level  of  abstraction,  i.e.  rules 
for  the  transformation  of  assertions  about  the  marginal  or 


joint  p.d.f.s.  It  was  this  abstraction  which  destroyed  the 


ability  of  Ramshaw's  system  to  handle  the  "useless  test" 


LEAPFROG  Revisited 


In  order  to  get  some  understanding  of  the  effects  of 
simple  assignment  statements,  let  us  look  again  at  LEAPFROG. 

Lea,pfrog:  if  K=0  then  K<-K+2  endif 
The  input  joint  p.d.f.  to  Leapfrog  is 

fK(k)  =  I  6{k=0)  +  i  6(k=l) 
which  simply  means  that  Pr(k=0)  =  and  Pr(k=l)  = 

The  augmented  program  would  be: 

if  K=0  then  {  6(k=0) (^6(k=0)+|6(k=l) } 

{  i6(k=0)  } 

K  <-  K+2  {  56((k-2)=0)  } 

{  §6(k=2)  } 

[  else  ]  {  6(kj<0)  (|6{k*0)+|6(k=l))  } 

{  56(k=l)  } 

endif  {  56(k=2)  +  |6(k=l)  } 

Which  is  exactly  what  we  should  get. 

In  handling  the  assignment  statement,  K  <-  K+2,  we 
observed  that  it  maps  k  as  follows: 
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In  general,  if  we  wish  to  keep  the  equations  in  terms 
of  the  original  variables,  we  have: 

[  Xi  Xi  +  c  ]  : 

<Xj^,X2» . .  ,x.  , . .  ,x^>  ->  <Xj  ,X2 » . . .  ,x j-c , . . .  ,Xj^> . 

Next  we  will  look  again  at  the  COINFLIP  algorithm.  To 
do  that  we  need  some  rules  about  the  effects  of  a 
conditional  statement  which  contains  a  non-deterministic 
part.  We  can  easily  transform  a  non-deterministic  relation 
into  a  non-deterministic  assignment  followed  by  a 
deterministic  conditional  statement.  For  example; 

if  X  =  RANDOMj^^  then  {  S^.  }  else  {  }  endif 

becomes 

Y  <-  RANDOM^ ^ 

if  X*Y  then  {  }  else  {  }  endif. 

Theorem  3 

Let  fj{(x)  be  the  join*  p.d.f.  of  the  n  program  variables 
Xj^ ,X2 » •  •  •  f in  the  t  -gorithm  just  prior  to  the  conditional 
statement 

if  R  then  {  S^.  1  else  {  S^}  endif 
where  R  is  a  logical  relation  containing  a  finite  number,  m, 
of  random  (possibly  pseudo-random)  functions  RANDOM^j.  Let 
R'  be  derived  from  R  by  replacing  each  instance  of  RANDOM^^ 
with  a  reference  to  a  new  program  variable  Yj,  then  the  fol¬ 
lowing  sequence  of  statements  are  equivalent  to  the  original 
statement : 

Yj^  =  RANDOM^ 

Y2  =  RANDOM^ 2 

•  •  •  « 

•  •  •  • 

Y  =  RANDOM,^ 

m  £m 

if  R'  then  [  }  else  I  }  endif 
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Theorem  4 


Let  joint  p.d.f.  of  program  variables 

have  been  defined,  and  let  Y  be  a  "new" 
variable  defined  by  the  statement  Y  RANDOWg,  where 

RANDOM^  generates  a  statistically  independent  random  number 
from  distribution  g{y),  then  the  joint  p.d.f.  after  this 
statement,  ,  is 

h2(z)  =  ‘g(y) 

where , 

z  =  <Xj^,X2,  • . .  ,Xj^,y> 

Z  =  <Xj^,X2,  . . .  ,Xj^,y>. 

It  is  now  time  to  examine  the  general  assignment  state¬ 
ment  between  two  program  variables.  We  will  use  a  memory- 
to~register,  register-to-memory  model  for  the  assignment 
statement.  This  will  allow  us  to  have  the  statement  X  X 
be  a  NOOP  in  the  formalism  without  any  special  rules.  We 
introduce  the  notation 

to  mean  the  summation  over  all  values  of  random  variable  x^^ . 
This  is  the  discrete  equivalent  of  the  definite  integral. 
When  it  is  applied  to  a  function  of  x  ,  the  result  does  not 

X 

depend  on  x^.  If  this  summation  is  done  symbolically,  all 
occurences  of  x^  are  removed  from  the  equation  of  the 
result.  Here  are  some  properties  of  this  summation  which  we 
shall  use  later: 


f(Xj)  =  1 


when  f(Xj)  is  a  p.d.f. 
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^  6(Xj»Xj)  )  -  f(Xj) 

Ex.  (  )  *  F<Xj) 

where  P(Xj)  =  pr(  XGA)  |A*{X^Xj}is  the  cumulative 
probability  density  function  (c.p.d.f.)  for  f.  Note  that  in 
the  case  of  discrete  random  variables  we  usually  have  to 
worry  about  whether  or  not  the  c.p.d.f.  is  defined  to 
include  x^  or  whether  it  is  just  "up  to"  x^.  In  the  con¬ 
tinuous  representation  we  would  not  have  to  worry  about  this 
because  the  two  are  equivalent. 

Theorem  5 

Let  be  the  joint  p.d.f.  of  the  n  program  variables 

Xj,X2»...»X^  just  before  the  program  statement 


Then  the  joint  p.d.f.  after  this  assignment  statement  is 

6(x.=r)  )  6(r=Xj) 

The  application  of  6(Xj=r)  within  the  summation  takes 
care  of  the  case  when  x.  is  the  same  variable  as  Xj.  In  the 
cases  where  Xj^  and  Xj  are  different  variables,  the  rule 
reduces  to: 

gj((x)  =  (  Ylx.  ^  6(Xj=Xj) 

For  an  example  we  will  look  at  a  simple  program  which 
interchanges  the  contents  of  two  variables  Xj^  and  Xj  using  a 
third  variable  X^  as  temporary  storage.  The  augmented 
program  goes  like  this: 
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X3 

Xi  <-  X2 
X2  X3 


X3  initially 


{  fj^(X3,X2,X3)  =  gj^  (Xj  ,X2)  6  (X3=0)  } 

{  fjj(x^,X2,X3)  =  gj((X3,X2)6(X3=Xj)  } 

{  £jj(X3^,X2,X3)  =  gjj(x3,X2)  6  (Xj^=X2)  } 

{  fj^(Xj^,X2/X3)  =  gj^(x3,Xj^)  6  (X2=X3)  ) 

Note  that  we  need  not  have  assumed  that 
contained  0.  We  could  have  started  with  the  general  joint 
p  •  d  •  f  •  • 

Then  the  first  assignment  would  have  resulted  in  : 

X3  4-  Xj^  {  fjj(Xj^,X2,X3)  =  lYi^  g^(X3,X2,X3)  )  6(X3*X3)  } 

=  gjj(X3,X2)  6(X3=X3) 

where  gx(x^,X2)  =  9x<*l»’‘2'’‘3^ 

The  remainder  of  the  example  would  be  as  before. 


COINFLIP  Revisited 

We  now  have  all  the  tools  to  handle  COINFLIP  and  get  the 
real  answer  in  a  systematic  way.  The  annotated  main  program 
is: 

procedure  COINFLIP 

I  «-  0  {  fj(i)  =  6(i=0)  } 

call  TC(I)  {  fj{i)  =  g(i)  } 

print  i,'  times.'  {  fj(i)  =  g(i)  } 

The  problem  is  to  determine  what  the  function  g(i) 
looks  like.  This  is,  of  course,  determined  by  the  sub¬ 
routine  TC.  We  now  proceed  to  the  analysis  of  TC.  Assume 
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that  the  p.d.f.  entering  TC  is  fj(i). 
procedure  TC(I) 

y  <-  RANDOM^^  {  fj{i)  *  (  |6(y=H)  +  |6(y®T))  } 

if  Y  =  T  then  {  fj(i)  *  |6 {y=T)  } 

print  'OK,  so  far !  ' 

I  <-  I  +  1  {  fj(i-l)  *  56(y=T)  } 

call  TC(I)  {  g}{i)  } 

end  if 

{  gj(i)  +  fj(i)  *  |6(y=H)  } 

end  TC 

Where  gj(i)  represents  the  value  of  I  returned  by  the  recur¬ 
sive  call  to  TC.  Now,  the  distribution  {  fj(i-l)  |6(y=T)  ) 
is  presented  to  the  next  call  of  TC(I),  so  we  must  have  in 
general ; 

fj(i)  =  fj(i-l)  •  56(y»T) 

Since  the  variable  Y  is  local  to  TC(I),  it  must  be 
eliminated  from  the  joint  p.d.f.  that  is  returned.  We  will 
refer  to  this  process  as  "killing"  a  variable.  This  is  done 
by  finding  the  marginal  p.d.f.  of  I  with  respect  to  y; 

fl(i)  =  fi(i-l)  26(y=T)  »  ^fj(i-l) 

Note  that  if  Y  were  to  be  treated  as  a  global  variable,  this 
step  would  take  place  as  part  of  the  RANDOM^^^.  assignment 
statement.  The  initial  condition  from  the  main  program  is 
fj(i)  =  6(i=0),  so  the  distribution  for  the  first  recursive 

call  is: 

fj(i)  »  §6(1-1  =  0)  *  §6(i=l) 
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and  in  general  we  see  that 

fjCi)  =  6(i=j) 

where  j  is  the  number  of  times  that  'OK,  so  far!'  has  been 
printed  out.  This  distribution  represents  the  part  of  the 
distribution  which  is  "caught  in  the  loop".  Each  time  some 
of  the  distribution  "escapes".  This  corresponds  to  the 
chance  that  Heads  will  turn  up  at  any  time.  For  each  value 
of  j,  the  joint  p.d.f.  that  "escapes"  is  (5) ^6 ( i= j ) |6 (y=H) , 
this  joins  the  rest  at  the  end  if  to  give  the  final  answer: 

=  5  ^^j  j  6  {  0,  1,  2,  -  } 

We  note  that  this  is  in  fact  a  normalized  p.d.f.  What  is 
the  expected  value  of  I? 

E]j  {5)^^(i=j)  i.j  e{  0,  1,  2,  ...  } 

=  i(  0*1  +  1*1  +  2*(i)^  +  . 

by  distributing  and  regrouping  each  fraction  we  get: 


§(  2  )  =  1 


If  we  had  performed  this  analysis  on  Ramshaw's  [5] 

version  of  COINFLIP, 

C  <-  0; 

loop  X  <-  RANDOMjj^f  C  <-  c  +  1}  while  X»T  repeat 
we  would  have  gotten  the  final  joint  p.d.f.: 

6(x»H)  (5)^6(c=j)  ,  j  e  {  1,2,3,...  1 
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This  contains  all  of  the  information  that  is  in  Rarashaw's 


output  assertion  for  the  same  problem  [5^  p.78] 
[Pr(C<l)=0]  AtFr  (X=T)»0]A  A  tFr(C«c,X“H) 

c  >  1 


FINDMAX  Revisited 

We  will  again  follow  Ramshaw  [5,  p.81]  and  use  a 

slightly  different  form  of  the  FINDMAX  program  than  was 
presented  in  Chapter  3.  We  will  replace  the  input  array 
A(I)  of  random  variables  by  repeated  calls  to  a  random 
number  generator.  This  simplifies  the  notation  somewhat 
without  sacrificing  generality.  We  will  return  to  the  array 
notation  when  we  deal  with  the  sorting  algorithms.  The 
program  is  instrumented  to  record  the  number  of  times  a  new 
maximum  is  selected.  The  modified  and  annotated  program  in 
recursive  form  is; 

procedure  FIMDMAX(  M,M  ) 

C  0;  1  <-  2  {  6(c=0)  6(i=2)  } 

M  RANDOM^  {  6(c®0)  6(i®2)  f (m)  } 

call  LOOPl  (  )  {  g(n,m,c,i)  } 

end  FINDMAX 

procedure  LOOPl  (N,M,C,I)  {  h(n,m,c,i)  } 

if  I  <  N  then 

{  h(n,m,c,i)  6(i^n)  ) 

T  «-  RANDOM^  {  h(n,m,c,i)  6(i£n)  f(t)  } 

if  T>M  then  {  h(n,ro,c,i)  6(i<n)  f(t)  6(t>m)  } 

C  c  +  1  {  h(n,m,c-l, i> ^ (i<n) f (t) 6 (t>m)  } 

M  4-  T 

{  ^(m»t)  (Yi  h{n,m,c-l,i)6(t>m))6(i<n)f (t)  } 

m 
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[else]  {  h(n,m,c,i)  6(i<n)  f(t)  6(t<m)  } 
end  if 

{  ^(m=t)  h(n,m,c-l,  i)S  (t>m) )  6(i£n)f(t) 

m 

+  h{n,m,c,i)  5(i£n)  f{t)  6(t£m)  } 

I  1  +  1 

{6(i-l<n)  (6(m=t)  h(n,m,c-l,i-l)6(t>m)  )f  (t) 

~  m 

+  h(n,m,c,i-l)  f{t)  6(t£m)  )  } 
call  LOOPl  (  N,M,C,I  ) 

{  g(m,n,c,i)  } 

end  if 

{  h(n,m,c , i ) 6 ( i>n)  +  g(m,n,c,i)  } 
Note  that  all  of  the  joint  p.d.f.  is  caught  in  the  loop 
or  recursive  calls  until  I  is  incremented  past  N.  The 

reciJ.  -''on  which  we  must  solve  is: 

h(n,ra,c,i)  »  {6  ( i-l<n)  (6  (m=t)  ' h (n ,m, c-1 ,  i-1)  6  ( t>m) )  f  ( t) 

~  m 

+  h(n,m,c,i-l)  f(t)  6(t£m)  )  } 

T  is  a  local  variable  to  LOOPl  and  not  sent  outside  that 
subroutine  so  we  must  "kill"  it. 

*  {6(i-l<n)  (6{m  h(n,m,c-l,i-l)6 ( t>m) ) f ( t) 

t  m 

+  h(n,m,c,i-l)  f(t)  6(t£m)  )  } 

At  first  glance,  this  recursion  doesn't  look  very  useful. 

To  get  a  handle  on  what  is  going  on,  we  will  follow  the 
first  few  iterations  of  the  program.  In  duirg  so  we  will 
drop  the  termination  delta  function.  The  initial  call  is 
made  with 

h{n,m,c,i^  »  6 (c*0) *f (m) '6 (i*2) 
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Applying  the  rules  we  find  that 

h(n,m,c-l,i-l)  »  6(c=l) -fdn) -6(1=3) 

and 

h(n,m,c,i-l)  =  6 (c=0) * f (m) ‘6 ( i=3) 

•so  we  have 
h(n,m,c,i)  = 

6(1  =  3)  {  6(c=l)-6(m=t)*('^jj,  f  (m)  *6  (t>m) ) -f  (t) 

+  6(c=0) -f (m) -f (t) •6(t<ra)  } 

y 

h(n,in,c,i)  =  6(i  =  3)  {  6  (c=l)  *6  (m=t)  *  (F(  t) )  *  f  ( t) 

+  6(c=0) -f (m) .f (t) .6 (t<m)  } 

h(n,m,c,i)  =  6(i  =  3)  {  6  (c-1)  *F(m) ’f  (m)  +  6  (c=0) ‘f  (m) ‘Fdn)  } 

We  can  rewrite  this  into  an  equivalent  form 

h(n,m,c,i)  =  6(i=3)  {  2*P(m)*f(m)  (  |6(c=l)  +  |6(c*0)  )  ) 

If  we  crank  through  another  iteration  we  get; 
h(n,m,c,i)  = 

6(i=4)  {  3*F^(m) -f (m) • (|6(c»2)  +  56(0=1)  +  |6(c-0))} 

The  third  time  around  we  get: 
h(n,m,c,i)  = 

6 ( i-5) {4P^ (m) f (m) (5^6 (c»  3)+|^(  c=  c=l)  •f^6  (c*0)  ) 

Each  time  that  we  cycle  through  the  equations  we  find  that 
the  joint  p.d.f.  is  a  product  of  the  marginal  p.d.f.s  of  the 
individual  variables.  We  have  factored  the  coefficients  to 
normalize  the  marginal  p.d.f.s  with  respect  to  m  and  c. 
When  the  joint  p.d.f.  of  a  set  of  random  variables  can  be 
written  as  the  product  of  their  respective  marginal  p.d.f.s, 


then  the  variables  are  said  to  be  stochastically  indepen¬ 
dent.  This  is  a  very  important  thing  for  us  to  confirm  in 
this  case.  It  tells  us  that  we  have  not  affected  the 
distribution  of  the  maximum  value  by  instrumenting  the 
program.  The  stochastic  independence  also  simplifies  the 
solution  of  the  recurrence  relations.  Because  of  it  we  can 
set  up  a  recursion  for  each  variable  separately  by  following 
the  marginal  p.d.f.  for  each  variable.  We  change  the 
induction  variable  from  i  to  j  =  i  -  1  so  that  the  formulas 
will  look  more  familiar. 

j  “  j-1 

and 

£c(c)j  =  5 

The  recursion  for  gives  the  final  distribution  of 

£•«(«»)„  *  n*F"~^(m)  ‘f  (m) 

Pi  n 

which  is  the  answer  given  by  Hogg  [12].  The  recursion  for 
f^(c)  is  the  same  as  Knuth's  [6]  and  Ramshaw's  [5]. 


t 


CHAPTER  5 


APPLICATION  TO  SORTING  AND  SEARCHING 

.We  now  turn  our  attention  to  the  further  application  of 
our  approach  to  sorting  and  searching  algorithms.  We  will 
look  at  three  such  algorithms:  The  "oblivious"  Insertion 
(Bubble)  Sort,  the  "improved"  Insertion  Sort,  and  Binary 
Search . 


"Oblivious"  Insertion  Sort 

Insertion  Sort  was  used  by  Wegbreit  [2]  as  the  example 
for  verifying  program  performance.  He  used  the  "improved" 
version  which  has  an  exit  in  the  inner  loop  after  each 
candidate  element  is  properly  positioned.  The  "oblivious" 
version  of  this  program  does  not  have  this  exit.  It  con¬ 
tinues  to  compare  the  element  being  inserted  to  all  of  the 
elements  in  the  sorted  sublist.  While  it  is  an  inefficient 
software  algorithm,  this  version  of  the  algorithm  is  of 
interest  because  it  can  be  realized  using  a  network  of  com¬ 
parators  (i.e.  using  hardware  logic  circuits). 
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1  procedure  INSERTION  SORT  (  B  ,  N  ) 

2  real  B(1:N) 

3  OUTER; 

for  J  ■<-  1  to  N-1  do 

4  INNER: 

for  I  J  to  1  by  -1  do 

5  if  B(I)  >  B(I+1)  then 

6  EXCHANGE  (  B(I),  B(I+1)  ) 

7  endif 

8  repeat 

9  repeat 

10  end  INSERTION  SORT 

The  first  step  is  to  convert  the  loops  to  recursive 

subroutine  calls.  We  will  number  the  statements  so  that 
they  may  be  related  back  to  the  original  program.  We  will 
also  insert  a  counter  variable,  Y,  to  keep  track  of  the 

number  of  times  an  EXCHANGE  takes  place. 

1  procedure  INSERTION  SORT  (  B  ,  N  ) 


2 

real  B(1:N) 
global  integer  Y 

3a 

J  <-  1;  Y  <-  0 

3b 

call  0UTER(  J,  N-1, 

10 

end 

INSERTION  SORT 

3c 

procedure  OUTER (  J,  LIM 

3d 

ifLIM-J>0 

4a 

I  <-  J 

4b 

call  INNER (  I, 

9a 

J  J  +  1 

9b 

call  OUTER (  J 

9c 

endif 

9d 

end 

OUTER 
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4c  procedure  INNER (  I,  B  ) 

4d  if  I  1  then 

5  if  B(I)  >  B(I+1)  then 

6  EXCHANGE  (  B(I),  B(I+1)  ) 

6a  y  «-  Y  +  1 

7  end i f 

8a  I  I  -  1 

8b  call  INNER  (  I,  B  ) 

8c  endif 

8d  end  INNER 

Appendix  A  contains  a  detailed,  line-by-line  tracing  of 
the  joint  p.d.f.  which  is  used  in  an  "average  case" 
analysis.  From  it  we  can  develop  the  form  which  the  distri¬ 
bution  of  a  "sorted"  list  takes.  Specifically,  we  have; 

where  f '  (bj  ,b2# . . .  »bj^)  is  some  transformation  of  the  initial 
joint  p.d.f.  The  leading  product  of  Anderson  deltas  con¬ 
tains  the  information  that  the  list  is  sorted.  This  may 
seem  like  a  simple  thing,  but  remember  that  having  started 
with  an  algorithm  and  the  assertion  that  it  "sorts  a  list", 
we  have  arrived  at  a  form  of  joint  p.d.f.  which  means  "the 
list  is  sorted".  If  we  were  to  give  an  automatic  analyzer 
an  algorithm,  and  if  it  came  up  with  a  final  joint  p.d.f. 
that  had  this  form,  the  automatic  analyzer  could  say,  "this 
algorithm  sorts  a  list."  Conversely,  if  the  analysis  does 
not  result  in  a  joint  p.d.f.  of  this  form  then  the  analyzer 
can  say,  "this  algorithm  does  not  sort  a  list." 

When  analyzing  sorting  algorithms,  three  different 
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These 


types  of  input  distributions  are  usually  used, 
represent  the  initally  sorted  list,  the  initially  reverse 
sorted  list,  and  the  initially  "random"  list.  These  three 
sometimes  cover  the  best,  worst,  and  average  case  execution 
times,  although  not  necessarily  in  that  order.  In  some  more 
exotic  algorithms,  there  is  a  more  complicated  input  distri¬ 
bution  which  leads  tc  the  best  or  worst  case  behavior.  Our 
approach  can  be  used  to  determine  the  best  and  worst  case 
distributions,  although  we  will  not  dwell  on  this.  The  best 
case  performance  for  Insertion  Sort  comes  when  the  EXCHANGE 
never  takes  place,  and  the  worst  case  performance  comes  when 
the  exchange  always  takes  place. 

The  work  shown  in  Appendix  A,  for  the  average  case 
analysis,  suggests  the  induction  hypothesis  that  if  you  give 
INNER,  at  its  call  from  OUTER,  the  distribution 

6(i=j) •6(j<n) •6(j=k) • 

k!  •6(b^>bk_i)  'f  (b2)  ’"f  (bj,)  , 

INNER  returns  the  distribution 

6(i=0) •6(j<n) *6(j=k)  • 

(k+1)  r6(b^^^>bk)  ’'*^(b2>bi)  *f  (b^)  •f(b2)  •••f(bjj)  . 

In  other  words,  INNER  inserts  the  k+1^^  element  into  the 
sorted  list  of  the  first  k  elements.  We  are  therefore 
justified  in  picking  as  the  general  form  for  a  joint  p.d.f. 
going  into  INNER 

6(i=m) *6(m<j) '6( j<n) * 

^(bj>b^_j)  •••6(b2>bj)  -  f  (y,bj,b2,...,b^,...,bj,)  . 
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Rather  than  doing  that,  let  us  start  with  a  completely 
general  joint  p.d.f.  g(  j  ,i  »n,y,bj^,b2, . . .  ,bjj)  after  4c. 

After  4d,  in  the  true  branch: 

6(i>l)  •g(j,i,n,y,bj,b2,...,bjj) 

Sent  to  8c,  is  the  false  branch: 

6(i=0)  g  ( j  ,  i  ,  n ,  y ,  b^  » b2 ,  • » • » bjj ) 

After  5,  in  the  true  branch: 

6  ( i2ll)  ri  ,y  /bj  ,b2 , . .  • ,  bj^) 

Sent  to  7,  in  the  false  branch  is: 

6(i>l)  *gO»irn,y,b^,b2,.../bjj) 

After  6, 

6(i>l)  +  *g(j/i,n,y,bj^,b2»..»bj^j^,bj,..,bjj) 

After  6a, 

6(i>l)  ‘^(bi^.i>bj)  ‘g(j,i  ,n,y-l,bj^,b2» . .  ,bj , .  *  ,bj,) 
After  7, 

6  (  i>_l )  ’  6  9(j/i»n  ,y'*l  ,bj^,b2,  •  • 

+  g(j,ifn,y,bj,b2,..fbj,b.^j,..,bj,)  ) 

After  8a,  ,  . 

5(i+l>l) *0(bi+2^bj^j) * 

(  g{  j,i+lfn,y-l,bj^,b2, .  •  »bi+2'^i+l'  * 

+  g  ( j  /  i+1  f  n ,y , bj  ,b2 ,  •  •  ,bj_j_2 » •  • » ) 

We  have  arrived  at  the  recursive  calling  of  INNER,  so 

we  must  have: 

g  ( j  »  i  fn,y,b2  »b2» .  •  •  ,bj^) 

6(i+l>l) '^(bj^22bi+i) * 

(  g(j#i+lfn,y-l,bj,b2,..,bj^2'*’i+i'  *  "*^n^ 

+  g  ( j  »  i  +  l»n,y,bj^,b2, . .  ,bj^j^,b.^2' • » '^N^  ^ 
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From  the  other  parts  of  the  algorithm,  we  get  the 
boundary  conditions 

6(j<N) •6(i<j) 
and  the  initial  condition 

g  ( j  »  i /n,y,bj^,b2# . . .  ,b^)  = 

6(i=j) '6(n=N) -hCy) *6 (b j>b *  *  * 6 (b2>b^) * f (b^ .b^ , . . . ,b^) , 
assuming  that  f  is  symmetric  with  respect  to  interchange  of 
variables. 

Note  that  this  is  a  "backward"  recursion,  i.e.  we  start 
with  i=j  and  move  backward  to  the  desired  answer  for  i=0. 
Once  we  have  solved  the  recursive  relationship  for  INNER 
(based  on  i),  we  can  use  that  to  solve  the  recursive  rela¬ 
tion  for  OUTER  (based  on  j) ,  which  gives  the  final  answer 
for  the  joint  p.d.f.  Doing  this  in  the  general  case  cannot 
result  in  a  closed  form  answer  in  the  usual  sense.  It  is 
possible  to  "write  down"  the  general  solution  for  any  given 
N,  but  the  equation  would  be  equivalent  to  the  one  that  we 
would  get  if  we  were  to  "unwind"  the  loops  into  straight 
line  code.  In  order  to  get  really  useful  results,  we  need 
to  select  the  form  of  the  joint  p.d.f.  for  the  unsorted 
list . 

Once  one  has  selected  an  initial  joint  p.d.f.,  and 
solved  the  recursion  relations,  one  has  a  joint  p.d.f.  which 
represents  the  distributions  of  the  variables  at  the  termin¬ 
ation  of  the  algorithm.  The  distribution  of  the  counter 
variable  is  then  isolated  by  summation  (integration)  over 
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all  the  other  variables.  This  marginal  p.d.f.  is  then  used 
to  find  the  expected  value,  variance,  and  other  statistics 
in  the  usual  manner. 


"Improved*  Insertion  Sort 

The  relative  performance  of  the  "oblivious"  insertion 
sort  can  be  improved,  by  noting  that  the  portion  of  the 
joint  p.d.f.  that  fails  the  test  at  statement  5,  is  already 
in  sorted  order.  We  can  exit  from  the  INNER  loop  at  this 
point  without  affecting  the  algorithm's  ability  to  sort. 
Such  "obvious"  improvements  often  have  hidden  side  effects, 
but  our  method  will  let  us  prove  that  the  modified  algorithm 
still  sorts.  It  also  turns  out  that  the  distribution  of  I 
will  give  a  direct  indication  of  the  algorithm’s 
performance.  For  this  reason,  we  will  delete  the  counter 
variable  Y. 

1  procedure  INSERTION  SORT  (  B  ,  N  ) 

2  real  B(1:N) 

3  OUTER: 

for  J  1  to  N-1  do 

4  INNER: 

for  I  J  to  1  by  -1  do 

5  if  B(I)  >  B(I+1)  then 

6  EXCHANGE  (  B(I),  B(I+1)  ) 

6a  else  exit  /*  This  is  the  change  */ 

7  endif 

8  repeat 

9  repeat 

10  end  INSERTION  SORT 
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1 

1 

I 

The 

recursive  equivalent  is: 

1 

procedure  INSERTION  SORT  (  B  ,  N  ) 

2 

real  B(1:N) 

3a 

Cl 

1 

3b 

call  OUTER (  J,  N-1,  B  ) 

10 

end  INSERTION  SORT 

3c 

procedure  OUTER (  J,  LIM,  B  ) 

3d 

if  LIM-  J>  0  then 

4a 

I  J 

4b 

call  INNER (  I,  B  ) 

9a 

J  J  +  1 

9b 

call  0UTER(  J,  LIM,  B  ) 

9c 

endif 

9d 

end  OUTER 

» 

4c 

procedure  INNER (  I,  B  ) 

4d 

if  I  1  then 

5 

if  B(i)  >  B{I+1)  then 

6 

EXCHANGE  (  B(I),  B(I+1)  ) 

6a 

else  return 

7 

endif 

8a 

I  I  -  1 

8b 

call  INNER  (  I,  B  ) 

i 

8c 

endif 

8d 

end  INNER 

' 

The  return  in  the  recursive  program  is  equivalent  to 

t 

the 

exit  in  the  loop  version.  Everything  works  the  same  as 

before  up  to  statement  6a.  At  this  point,  the  joint  p.d.f. 

- 

from  the  false  branch  "escapes"  from  INNER.  We  will  pick  up 

1 

the 

analysis  at  that  point  on  the  J^l  iteration. 

5 

This  is  the  first  test  involving  the  data  itself.  This 

!  "  ^  i 

statement  splits  the  joint  p.d.f.  on  the  basis  of  the 

i 

!  -'i 

values  of  B(l)  and  B(I+1). 

j  . 

— 

- -  -  - 
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■  1 

4 

In  the  true  branch: 

6(i>l) •6(i=j) •6(j<n) *6(j=l) •6(b^>b2) * 

f(bj)  •f{b2)*“f(bj,) 

In  the  false  branch: 

6(i>l) •6(i=j) •6(j<n) •6(j=l) •6(b2>b^) * 
f  (bj)  •f(b2) 

This  EXCHANGES  the  values  of  b2  and  b^ 

6(i>l) •6(i=j) •6(j<n) •6(j=l) •6(b2>bj) * 
f (b2) ‘fib^) --'fib^) 

This  sends  the  false  branch  joint  p.d.f.  back  to  OUTER. 
6(i>l) •6(i=j) •6(j<n) •6(j=l) •6(b2>bj) • 
f (bj) -f (b2) •**f<b^) 

It  is  accumulated  there  as  we  shall  see. 

At  the  join  for  the  if  statement  we  have  only  the  true 
branch  left 

6(i>l) •6(i=j) •6(j<n) •6(j=l) •6{b2>bj) * 
f  (b^)  *f  (b2)  •••f(bj,) 

This  adjusts  I  for  the  next  iteration 

6(i+l>l)  •6(i+l»j)  •6(j<n)  •6(j>^l)  •5(b2>b^) 

•f(bj)*f(b2)*’*f(bj,) 

We  know  from  step  4d  above,  that  this  joint  p.d.f.  will 
be  returned  with  the  additional  (superfluous) 
restriction  6(i<l).  Simplifying  we  have 
6{i-0)*6(j<n)*6(j=l)*6(b2>bj)*f{bj)*f(b2)***f(bj,) 

This  joint  p.d.f.  is  returned  at  4b.  It  joins  with 
joint  p.d.f.  that  "escaped". 
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The  result  is: 

{6(i=l)+6(i=0) }*6(j<n) •6(j=l) *6{b2>b^) 'f (b^) *f {b2) * ‘ *f (b^) 

9a  This  statement  adjusts  J  for  the  next  iteration,  and 
{6(i  =  l),+6(i=0)  }  •6(j-l<n)  •6(j-l=l)  •6(b2>bj^) 
•f(bj)-f(b2)***f(b^) 
is  again  passed  to  OUTER. 

3d  We  see  now  that  this  test  "traps"  all  of  the  joint 

p.d.f.  in  the  loop  until  J  exceeds  LIM  (  N-1  in  our 
case  ).  So  we  won't  mention  the  false  branch  until  the 
end.  In  the  true  branch: 

{6(i=l)+6(i=0)}*6(j<n) *6(j=2) •6(b2>b^) * 
f(bj)*f(b2)***f(bj^) 

4a  This  collapses  the  old  joint  p.d.f.  on  i  and  results  in 
6(i  =  j)  •6(j<n)  •6(j  =  2)  •2*6(b2>bi)  -  f  (bj^)  ‘f  (b2)  •••f  (b^) 

In  the  oblivious  version,  this  was  a  trivial  operation. 
Here  it  destroys  information  about  the  distribution  of 
the  I  in  the  last  iteration. 

4d  This  joint  p.d.f.  arrives  at  INNER,  where  this 

statement  controls  the  exit  of  the  last  of  the  joint 

p.d.f. 

5  In  the  true  branch: 

6(i=j) •6(j<n) •6(j*2) '2*6(b2>bj) •6(b2>b3) ’ 
f(bj)*f(b2)***f(b^) 

In  the  false  branch: 

6(i-j) •6(j<n) *6(3-2) •2*6(b2>bj) •6{b3>b2) * 

f(bp*f(b2)***f(by) 
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6 


6a 


7 


8a 


5 


6 


The  exchange  yields: 

6(i=j) •6(j<n) •6(j=2) •2*6(b3>bj) •6(b3>b2) * 
f(bp-f(b2)**‘f(b^) 

Here  the  false  branch  again  escapes  in  the  form  of 
6{i=2) •6(j<n) •6( j=2) •2*6(b2>b^) •6(b3>b2) * 
f  (bp  -f  (b2)  ••*f  (bj,) 

At  the  join  we  have  only  the  true  branch  joint  p.d.f 
lef  t : 

6(i=j) •6(j<n) •6(j=2) *2*6(b3>bp •6(b3>b2) * 
f(bp  ‘f(b2)***f{b^) 

Prepares  for  the  next  call  of  INNER 

6(i  =  j*-l)  •6(j<n)  •6(j=2)  •2-6{b3>bp  •6(b3>b2)  * 

f(bp*f(b2)*“f(bj,) 

This  gets  through  to  statement  5  in  INNER. 

In  the  true  branch  (multiply  by  6(bj^>b2)  and  simplify) 
6(i  =  j-l) •6(j<n) •6(j=2)  • 

2*l6(bpb2)  •6(b3>bj)  •6(b3>b2)  ) 

•f (b^) •f(b2) 

In  the  false  branch  (multiply  by  6(b2>b^),  simplify): 
6(i=j-l) •6(j<n) •6(j=2) 

•2-{6(b3>b2) •6(b2>b3) ) 

•f  (bp  -f  (b2)  “'fibjj) 

The  EXCHANGE  in  the  true  branch  yields: 

6(i  =  j-l) •6(j<n) •6(j=2)  ' 

2M6(b3>b2)  •6(b2>b^)  }’f(bj)  -f  (b2)  ••*f(bjj) 
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6a  Again  the  false  branch  joint  p.d.f.  escapes 

6(i=l) •6(j<n) •6( j=2) 

•2*{6(b3>b2) *6(b2>b3) } 

•fCbp  -f  (b2)  *••£  (b^) 

7  At  the  join  we  have  only  the  true  branch  joint  p.d.f. 
left: 

6(i=j-l) *6(j<n) *6(j=2) * 

2*{6(b3>b2)  *6(b2>b3)  }*f  (b^)  -  f  (b2)  ‘  •*£  (bj^) 

8a  Sets  I  to  zero  in  this  case,  and  the  next  call  of  INNER 
returns  this  joint  p.d.f. 

6(i=0) *6(j<n) *6( j=2) ’ 

2*{6(b3>b2)  •6(b2>b3)}‘f  (bp  -f  (b2)  •••f(bj^) 
to  OUTER  at  statement  9a. 

4b  The  three  sets  of  joint  p.d.f.s  meet  and  are  added 
here.  We  have: 

{6(i=0)+6(i=l)+6(i=2)}*6(j<n) •6(j=2) • 

2M6(b3>b2)  *6(b2>b3)  }*f  (b^)  •f(b2)  •••£ (bjj) 

9a  Increments  J  and  we  get,  going  back  into  OUTER  at  9b: 

{6(iaO)+6(i»l)+6(i=2) }’6(j<n+l) •6(j=3) * 

2*{6(b3>b2) •6(b2>b3)}-f (b^) -f (b2) ’ * ‘f (b^) 

By  now  the  pattern  is  clear.  It  is  even  easier  to  show 
that  the  result  at  the  end  will  be: 

{6(i*0)+6(i=l)+....+6(i=N-l) }*6(j=N) • 

(N-1)  !  •{6(b^>b„_p  •••6(b2>bp  }-f  (b^)  -f  (b2)  •••£ (bjj) 

If  we  collapse  this  on  i,  then  we  get  the  same  result  as 
before.  Therefore,  the  change  in  the  program  has  not 
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changed  its  ability  to  sort.  This  form  tells  us  some  other 
things.  Specifically,  the  value  of  I  that  is  returned  by 
INNER  represents  the  number  of  elements  that  were  found  to 
be  smaller  than  the  j+1  element.  It  is  easy  to  see  that  I 
can  take  on  exactly  J+1  values  from  0  to  J,  and  that  each  of 
those  values  is  equally  likely.  This  is  something  that  one 
would  have  expected,  but  we  have  proved  it  without  recourse 
to  any  elaborate  combinatorial  or  probabilistic  arguments. 
The  result  just  "fell  out"  of  the  analysis.  It  is  easier  to 
write  a  program  that  can  recognize  that  the  probability 
density  function  of  a  discrete  variable  has  the  same  value 
at  each  point,  than  to  have  that  program  say  "Each  I  is 
equally  likelyl" 

The  other  thing  that  the  values  and  p.d.f.  for  I  tells 
us  is  the  number  of  exchanges  that  take  place.  Prom  the 
observation  above,  we  get  that  P(i®j)  *  so  that  the 
expected  number  of  exchanges  for  any  value  of  i  is 


^i  =  n  j  +  1 


for  the  entire  N  elements,  this  is 


which  is  the  correct  answer.  This  turns  out  to  be  the 
expected  number  of  comparisons,  also.  We  can  see  that  the 
running  time  performance  of  the  sort  has  been  improved  by  a 
factor  of  two. 
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Binary  Search 

We  now  turn  our  attention  to  the  analysis  of  an 
algorithm  for  a  Binary  Search.  This  particular  version 
closely  follows  one  given  by  Horowitz  and  Sahni  [9].  We 
introduce  it  here  for  two  reasons:  (1)  it  gives  us  a  chance 
to  present  the  case  statement,  and  (2)  it  is  the  first 


"divide  and  conquer*"  algorithm  that  we  have  considered.  The 
function  INT  returns  the  INTeger  part  of  the  argument  (i.e. 


the  floor  function) . 


1  procedure  BINARy_SEARCH  (  N,  I,  X  ) 

global  real  K(1:N) 

2  LOW  <-  1;  UP  <-  N 

3  I  0 

4  SPLIT:while  LOW  <  UP  do 

5  MID  «-  TnT  (  (  LOW  +  UP  )  /  2  ) 

6  case 

7  :  X  >  K(MID)  :  LOW  <-  MID  +  1 

8  :  X  =  K(MID)  :  I  «-  MID;  return 

9  ;  X  <  K(MID)  :  UP  <-  MID  -  1 

10  end 

1 1  end 

12  end  BINARY  SEARCH 


The  recursive  equivalent  is: 


1 

2 

3 

4a 

12 

4b 

4c 

5 

6 

7 

8 

9 

10 
11a 

11b 

11c 

lid 


procedure  BINARY_SEARCH  (  N,  I,  X  ) 
global  real  K(1:M) 

LOW  <-  1;  UP  N 
I  0 

call  SPLIT  (  LOW,  UP,  X,  I  ) 
end  BINARY_SEARCH 
procedure  SPLIT(  LOW,  UP,  X,  I  ) 
if  LOW  <  UP  then 

Miff  INT  (  (  LOW  +  UP  )  /  2  ) 
case 

:  X  >  K(MID)  :  LOW  «-  MID  +  1 
:  X  =  K(MID)  :  I  «-  MID;  return 
;  X  <  K(MID)  ;  UP  <-  MID  -  1 

end 

call  SPLIT  {  LOW,  UP,  X,  I  ) 

end  If 
return 
end  SPLIT 
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) 


Since  it  is  very  straight  forward,  we  will  just  sketch 
the  analysis.  We  start  with  the  array  K(1:N)  ordered,  so  we 
have  the  initial  joint  p.d.f. 

6(k^<k2) '^(k2<k3) •••6(k^_^<k^) -f (k^) *f (k^) •**£ (k^) 

The  search  key  X  is  drawn  from  a  p.d.f.  g(x),  and  the 
assignment  statements  2  and  3  have  their  usual  effect.  As  a 
result  we  have  SPLIT  called  with  the  joint  p.d.f. 

6(low=l) •6(up=N) *6(i=0) *g(x) * 

6(k3<k2)  •6(k2<k3)  •••6(k^_3<k^)  *f  (k^)  -  f  (k2)  •••f  (kj^) 

After  4c 

6(low£up) *6(low=l) •6(up=N) *6(i=0) ‘gix) * 

6(k^<k2)  ‘6(k2<k3)  •••6(k^_^<k^)  -f  (kj^)  -f  (k2)  “-f  (k^) 

After  5 

0(mid*X(l+N)/2i) •6(low<up) * 

6(low=l) •6(up=N) *6(i=0) ’gix)  • 

6(k3<k2) *^(k2<k3) •••6(k^_^<k^) -fik^) -f (k2) •••f (k^) 

At  6  the  joint  p.d.f.  splits  into  three  parts  with  the  arms 
of  the  case  statement.  The  middle  leg  allows  a  portion  of 
the  joint  p.d.f.  to  escape  back  to  the  calling  program. 

After  7 

^^’'^'^mid^  •6(mid=i(l+N)/2i)  * 

6 (low=mid+l) *6 (up=N) *6(1=0) *g(x) * 

6{k3<k2)  •6(k2<k3)  •••6(k^_^<kj^)  ’flk^)  -  f  (k2)  *  * ’f  (k^) 

After  8 

^^’‘“'‘mid’  '6(mid=i(l+N)/2i)  •6(low=l)  •6(up=N)  •6{i=mid)  •g(x)  * 
6(k3<k2) •6(k2<k3) •••6(k^_j<k^) -fik^) •f(k2) •••f (k^) 
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After  9 


6(x<k„id) •6(mid=l(l+N)/2i) * 

6(low=l) *6 (up=raid-l) *6 (i=0) •g(x) ’ 

6{k^<k2) *6(k2<k3) •••6(k^_^<k^)*f (k^) -f (k2) --'f (k^) 

The  sum  of  the  joint  p.d.f.  after  7  and  after  9  is 
presented  to  the  next  call  on  SPLIT.  Each  time  SPLIT  is 
called,  some  of  the  joint  p.d.f.  escapes  and  is  returned, 
until  the  final  return  for  no  find.  It  is  relatively  easy 
to  see  that  the  final  joint  p.d.f.  will  be 

[  6(i=0)  {  6(x<k^)  +  6(x>kj^)6(x<k2)  + . +  6{x>k^)  }  + 

(  6(i*mid)  ^(x=kmid^^  ^ 

mid«l 

•g(x) *6(k3<k2)*6(k2<k3)**-6(k^_3<k^)-f (kj) -f (k2)***f(k^) 

The  behavior  of  this  joint  p.d.f.  is  dependent  on  the  form 
of  g{x).  If  this  p.d.f.  restricts  the  value  of  x  to  those 
of  the  K(M)  with  equal  probability,  then  we  see  that  any  of 
the  values  is  equally  likely.  The  behavior  of  the  number  of 
comparisons  can  be  derived  by  instrumenting  the  algorithm. 
Doing  so  results  in  the  usual  log  n  behavior. 
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CHAPTER  6 


APPLICATION  TO  A  MISCELLANEOUS  PROBLEM 

We  will  now  look  at  Jonassen's  and  Knuth’s  celebrated 
"Trivial  Algorithm  Whose  Analysis  Isn't"  [8].  Ramshaw,  a 
student  of  Knuth's,  applies  his  Frequentistic  System  to  this 
algorithm  in  his  thesis  [5]  .  Jonassen  and  Knuth  did  not 
give  the  derivation  of  the  initial  recursion  relationships, 
but  derived  them  "by  reasoning  almost  directly  from  the  code 
of  the  program"  [5] .  We  now  believe  that  our  work  has 
formalized  this  "reasoning  almost  directly  from  the  code", 
because,  when  applied  to  this  algorithm,  it  proceeds 
directly  to  their  equations  2.1,  2.2,  and  2.3  [8]. 

Basically  the  algorithm  involves  the  insertion  and 
deletion  of  keys  in  a  binary  tree  structure.  The  insertion 
is  done  with  the  standard  binary  insertion  algorithm  and  the 
deletion  is  done  using  Hibbard's  algorithm[181 .  The  two 
possible  trees  with  two  keys  are  called  F  and  G.  The  five 
possible  binary  trees  with  three  keys  are  called  A,  B,  C,  D, 
and  E.  With  x  <  y  <  z,  we  have  the  following  pictures  for 
these  binary  trees: 


E(x,y,z) 


A(x,y,z) 

z 

/ 

y 

/ 


B(x,y,z) 

z 

/ 

X 

\ 

y 


C(x,y,z) 

y 

/  \ 

X  z 


D(x,y,z) 

X 

\ 

z 

/ 

y 


F(x,y) 

y 

/ 


G(x,y) 

X 

\ 

y 


The  insertion  algorithm  is  the  standard  one  for  binary 
insertion,  the  new  element  is  appended  to  the  tree  in  the 
appropriate  place.  Hibbard's  deletion  algorithm  proceeds  in 
a  straight-forward  manner  except  that  the  deletion  of  x  from 
D(x,y,z)  results  in  G(y,z)  instead  of  F(y,z),  as  one  might 
expect.  The  insertion  and  deletion  algorithm  is  given  in 
detail  in  the  program  which  follows.  We  will  not  go  further 
into  the  background  of  the  algorithm.  Anyone  interested 
should  see  the  Jonassen  and  Knuth  article  [8]  ,  which  does 
that  quite  nicely. 

While  the  others  [5,8]  have  always  assumed  that  the 
keys  are  selected  from  a  uniform  distribution,  it  turns  out 
that  this  restriction  is  unnecessary  in  our  approach.  It  is 
only  necessary  to  have  the  keys  drawn  from  the  same, 
stationary  distribution  f(x). 


Jonassen  and  Knuth 

[8] 

give 

the  graphical 

and 

word 

procedure 

representation 

of 

the 

algorithm,  we 

will 

only 

present  the  algorithm  as 

a 

SPARKS  program.  We 

will 

use 

Ramshaw' s 

[5]  notation 

for 

the 

tuples  representing 

the 
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condition  of  the  tree.  Furthermore,  we  will  adopt  the 
convention  that  after  assignment  the  "from"  variables  are 
set  to  zero  (  "killed"  ).  This  is  not  really  necessary,  but 
it  does  simplify  the  notation,  since  after  the  variables  are 
"killed"  we  no  longer  have  to  carry  t'.  em  in  the  joint  p.d.f. 
equations . 

1  procedure  TRIVIAL  (  N  ) 

/*  Load  the  initial  tree  */ 

2  X  random^;  Y  <-  random£ 


3 

if  (  X  <  Y  )  then 

4 

<S;V,W>  <G;X,Y> 

5 

else 

6 

<S;V,W>  <-  <F;Y,X> 

7 

/* 

end  if 

Th?  main  algorithm  loop 

8 

for  K  1  to  N 

/* 

Insert  a  key  */ 

9 

R  <-  random^ 

10 

case 

11 

;  S  =  F  and  R  <  V 

:  <T;X,Y,Z>  <-  <A;R,V,W> 

12 

;  S  =  F  and  V  <  R 

<  W  :  <T;X,Y,Z>  <-  <B;V,R,W> 

13 

;  S  =  F  and  W  <  R 

;  <T;X,Y,Z>  <-  <C;V,W,R> 

14 

;  S  =  G  and  R  <  V 

;  <T;X,Y,Z>  <C;R,V,W> 

15 

;  S  =  G  and  V  <  R 

<  W  :  <T;X,Y,Z>  <D;V,R,W> 

16 

;  S  =  G  and  W  <  R 

;  <T;X,Y,Z>  <-  <E;V,W,R> 

17 

end 

/* 

Now  do  the  deletion  */ 

18 

L  <-  randomj^Yjj 

19 

case 

20 

;  T  =  A  and  L  »  X 

:  <S;V,W>  <F;Y,Z> 

21 

;  T  =  A  and  L  ■  Y 

;  <S;V,W>  <F;X,Z> 

22 

:  T  =  A  and  L  ■  Z 

;  <S,-V,W>  <F;X,Y> 

23 

;  T  =  B  and  L  •  X 

;  <S;V,W>  <F;Y,Z> 
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24 

;  T 

= 

B 

and 

L 

s 

Y  : 

<S;V,W> 

<F;X,2> 

25 

:  T 

B 

and 

L 

* 

2  : 

<S;V,W> 

4-  <G;X,Y> 

26 

:  T 

= 

c 

and 

L 

= 

X  : 

<SjV,W> 

«-  <G;Y,2> 

27 

;  T 

c 

and 

L 

Y  : 

<S;V,W> 

<F;X,Z> 

28 

:  T 

= 

C 

and 

L 

= 

2  ; 

<S;V,W> 

<F;X,Y> 

29 

;  ,  .T 

s 

D 

and 

L 

= 

X  : 

<S;V,W> 

<G;Y,Z> 

30 

:  T 

s 

D 

and 

L 

s 

Y  : 

<S;V,W> 

4-  <G;X,2> 

31 

:  T 

s 

D 

and 

L 

s 

2  : 

<S;V,W> 

4-  <G;X,y> 

32 

:  T 

E 

and 

L 

= 

X  : 

<S;V,W> 

4~  <G;Y,Z> 

33 

;  T 

s 

E 

and 

L 

= 

Y  : 

<S;V,W> 

4-  <G;X,Z> 

34 

:  T 

s 

E 

and 

L 

= 

2  : 

<S;V,W> 

4-  <G;X,Y> 

35 

end 

36 

repeat 

37 

end  TRIVIAL 

The  recursive  version  of  this  program  is  then, 

1  procedure  TRIVIAL  (  N  ) 

/*  Load  the  initial  tree  */ 

2  X  random^;  y  random^ 

3  if  (  X  <  Y  )  then 

4  <S;V,W>  <-  <:G;X,Y> 

5  else 

6  <S;V,W>  <-  <P;Y,X> 

7  end if 

/*  The  main  algorithm  loop  */ 

8a  K  <-  1 

8b  call  MAIN  (  K  ,  N  ) 

37  end  TRIVIAL 

8c  procedure  MAIN  (  K,  N  ) 

8d  if  (  K  5  N  )  then 

/*  Insert  a  key  */ 

9  R  random^ 

10  case 

11  :  S  =  F  and  R  <  V  :  <T;X,Y,Z>  <A;R,V,W> 

12  ;  S  *  P  and  V  <  R  <  W  :  <T;X,Y,2>  <B;V,R,W> 
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13 

:  S 

=  F  and 

W  <  R 

:  <T;X, 

y. 

Z>  4-  <C; 

14 

:  S 

=  G  and 

R  <  V 

:  <T;X, 

Y, 

Z>  4-  <C; 

15 

:  S 

s  G  and 

V  <  R  < 

W  :  <T;X, 

y. 

Z>  4-  <D; 

16 

:  S 

=  G  and 

W  <  R 

:  <T;X, 

y, 

Z>  4-  <E; 

17 

end 

/* 

Now  do  the  deletion  */ 

18 

L  4- 

IT  3  rtd  2 

19 

case 

20 

:  T 

=  A  and 

L  =  X 

<S;V,W> 

4- 

<F;Y,Z> 

21 

:  T 

=  A  and 

L  =  y 

<S;V,W> 

4- 

<F;X,Z> 

22 

;  T 

=  A  and 

L  =  Z 

<S;V,W> 

4- 

<F;X,Y> 

23 

;  T 

=  B  and 

L  =  X 

<S;V,W> 

4- 

<F;y,Z> 

24 

;  T 

=  B  and 

L  =  Y 

<S;V,W> 

4- 

<F;X,Z> 

25 

:  T 

s  B  and 

L  =  Z 

<S;V,W> 

4- 

-  <G;X,Y> 

26 

;  T 

=  C  and 

L  =  X 

<S;V,W> 

4- 

<G;Y,Z> 

27 

:  T 

=  C  and 

L  =  y 

<SjV,W> 

4- 

-  <F;X,2> 

28 

;  T 

«  C  and 

L  =  Z 

<S;V,W> 

4-  <F;X,Y> 

29 

:  T 

s  D  and 

L  *  X 

<S;V,W> 

4-  <G;Y,Z> 

30 

;  T 

D  and 

L  =  y 

<S;V,W> 

4-  <G;X,Z> 

31 

:  T 

=  D  and 

L  =  Z 

<S;V,W> 

4- 

-  <G;X,Y> 

32 

:  T 

s  E  and 

L  =  X 

<S;V,W> 

4- 

-  <G;Y,Z> 

33 

;  T 

=  C  and 

L  =  y 

<S;V,W> 

4 

-  <G;X,Z> 

34 

:  T 

=  E  and 

L  =  Z 

<S;V,W> 

4 

-  <G;X,y> 

35 

end 

36a 

K  = 

K  +  1 

36b 

36c 

36d 


call  MAIN  (  K,  N  ) 


end  if 
end  MAIM 


The  analysis  is  as  follows: 
After  2 


W,R> 

V, W> 
R,W> 

W, R> 


After  3 


f  (X)  -f (y) 

6(x<y) 'f (X) *f (y) 


After  4 


After  5 


After  6 


After  7 


6(s=G) •5(v<w) ’f (V) 'f (w) 

6{x>y) -f (X) ’f (y) 

6(s=F) *6(v<w) *f (V) *f (w) 
{6(s=F)  +  6(s=G) } •6(v<w) *f (V) *f (w) 


After  8a 

6(k=l) •{6(s=F)  +  6(s=G) } •6(v<w) *f (V) ’f (w) 

Which  is  what  we  expected,  either  tree  is  equally 
likely,  and  the  joint  p.d.f.  is  that  of  a  sorted  list  of  two 
variables.  Rather  than  continue  to  follow  an  explicit 
example  through  the  algorithm,  as  we  have  done  in  the  past, 
we  will  define  unknown  functions  to  represent  the  various 
tree  forms.  Following  these  through  the  algorithm  will 
result  in  the  recursive  equations.  Let: 

6(k=K)  •6(v<w)  •{6(s=F)  •f^(v,w)+6(s=G)  •g,^(v,w)  } 
represent  the  joint  p.d.f.  that  is  presented  to  each  call  of 
the  recursive  subroutine  MAIN.  This  form  comes  from  looking 
ahead  and  recognizing  that  no  joint  p.d.f.  "leaks  out"  until 
the  end  of  the  loop. 

After  8d 

6(k<N)  •6(k=K)  •6(v<w)  •{6(s=F)  *f^(v,w)+6{s=G)  *g,^(v,w)  } 
After  9 

6(k<N)  •6{k=K)  •6(v<w)  '{6(s=F)  •f^(v,w)+6(s=G)  •g,^(v,w) }  *f  (r) 
In  order  to  simplify  the  expressions,  we  will  drop  the 
loop-counting-and-stopping  factor  6 (k£N) *6 (k=K) .  We  will 
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also  note  that  6  (s=F)  *6  (s=G)  =  0,  and  use  this  in  each  arm 
of  the  case  statement. 

After  11  .  .  . 

0(s=F) *f^(v,w) *f (r) •6(v<w) •6(r<v) * 

6(t=A) •6(x=r) •6(y=v) •6(2=w) 

using  the  convention  of  "killing"  the  old  variables, 

6(t=A)  •fj^(y,z)  -f  (X)  •6(x<y<z) 

Note  that  this  convention  simplifies  the  assignments  to 

<t;x,y,z>  because  the  distributions  of  these  variables  is 

always  6 (s=0) *6 (v=0) *6 (w=0)  at  this  point. 

After  12  ,  , 

6(t=B)  •f,^(x,2)  ’f  (y)  •0(x<y<z) 

After  13 

0(t=C) •f^(x,y) -f (z) •0(x<y<z) 

Aft  6  r  14 

6(t=C)  *g,^(y,z)  ’f  (x)  •6(x<y<z) 

After  15  ^ 

0(t=D)  *g,^(x,z)  *f  (y)  *o(x<y<z) 

After  16  .  . 

0(t=E)  *g,^(x,y)  'f  (z)  •0(x<y<z) 

After  17 

We  have  the  sum  of  the  six  arms  of  the  case  statement. 
It  is  at  this  point  that,  by  looking  ahead,  we  see  that  the 
next  general  functions  should  be  defined  as: 

a, ^(x,y,z)=f,^(y,z)  *f  (x) 

b, ^|x,y,z)=f(^(x,z)  *f  (y) 

C|^(x,y,2)=  f,^(x,y)  *f  (z)  +  9,^  (y ,  z)  ’  f  (x) 

(x,y,z)  =9,^ (x,z)  *f  (y) 
e,j(x,y,z)*9k(x,y)  *f  (z) 
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With  f (x)=6(0<x<l)  for  a  unitary  distribution,  these 
are  equations  2.1  in  Jonassen  and  Knuth  [8]. 

The  whole  joint  p.d.f.  after  17  is  then: 

{6(t=A)  •a^(x,y,z)  +  6(t=B)  •b^(x,y,z)  +  6  ( t=C)  *C|^  (x,y,  z) 

+  6 't=D) *d^ (x,y,z)  +  6(t=E) ‘e^ (x,y,z)  }  *  6(x<y<z) 

After  18 

{6(t=A) •a^(x,y,z)  +  5(t=B) *b^(x,y,z)  +  6 (t»C) (x,y,z) 

+  6(t=D) •d^(x,y,z)  +  6(t=E) •e^(x,y,z)  }  * 

6(x<y<z)  •  {  |6(1=X)  +  iSd-Y)  +  56(1*2)  } 
where  the  last  term  expresses  the  fact  that  any  of  the 
keys  may  be  deleted  with  equal  probability. 

After  20 

6(t=A) •a^(x,y,z) *56(1*X) *6(s=F) *6(v*y) •6{w»z) *6(x<y<z) 

We  now  apply  the  convention  of  setting  t,x,y,  and  z  to 
zero.  This  is  done  by  "integration"  over  these  variables 
using  Theorem  5.  We  will  use  our  summation  notation,  which 
is  defined  to  work  the  same  as  integration  if  the  functions 
are  taken  to  be  continuous.  Remember  that  if  a  variable  of 
integration  appears  in  an  Boolean  delta  function  and  is 
equal  to  a  free  variable,  then  the  effect  is  the  same  as  a 
change  of  variable.  In  this  case  y  and  z  appear  this  way, 
while  X  appears  only  with  respect  to  other  variables  of 
integration . 

^  {6(t=A)  .a|^(x,y,z)  .•56(1=X) 

l,t,x,y,z 

*6 (s*F) *6 (v=y) *6{w=z) *6 {x<y<z) }  = 
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CHAPTER  7 

SUMMARY  AND  CONCLUSIONS 

What  have  we  accomplished?  We  have  sketched  the 
foundation  for  a  systematic  approach  to  algorithm  analysis 
that  is  based  on  two  ideas: 

1.  Convert  all  loop  constructs  within  a  program  to 
recursive  subroutine  calls 

2.  Develop  a  representation  of  the  initial  joint  p.d.f. 
of  the  program  variables,  and  follow  the  effects 
that  the  program  has  on  that  joint  p.d.f. 

These  two  ideas  yield  recurrence  relations  for  the 
joint  p.d.f.  which  can  be  solved  to  get  the  joint  p.d.f.  at 
any  point  in  the  execution  of  the  algorithm.  The  branching 
probabilities  can  be  calculated  directly  from  the  joint 
p.d.f.  at  each  conditional  statement.  It  is  this  detailing 
of  the  branching  probabilities  that  was  missing  from  the 
automatic  analyzers  METRIC  and  EL/PL.  Therefore,  the  logical 
next  step  would  be  to  add  this  method  to  the  existing 
analyzers . 

The  central  addition  we  have  made  to  the  understanding 
of  the  behavior  of  joint  p.d.f.s  in  a  program  is  the  intro¬ 
duction  of  the  Boolean  delta  function.  This  function,  by 


I 

i 
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connecting  the  boolean  world  of  the  algorithmic  conditional 
statement  to  the  real  numbers,  makes  it  possible  to  keep 
track  of  the  effects  of  conditional  statements  on  the  joint 
p.d.f.s.  Its  form,  essentially  a  list  of  arguments,  makes 
it  very  easy  to  represent  and  operate  upon  in  a  computer 
program,  especially  since  LISP  seems  to  be  the  language  most 
used  in  this  type  of  work. 

Our  approach,  by  capturing  the  behavior  of  the  program 
variables  in  detail,  also  includes  a  means  for  verifying  the 
performance  of  algorithms.  All  of  the  information  that  can 
be  obtained  from  previous  methods  of  program  verification 
seems  to  be  present  in  our  method. 

Regardless  of  the  underlying  simplicity  of  the  ideas, 
the  method  is  very  tedious  to  apply  to  any  significant 
algorithm.  The  examples  given  in  this  thesis  were  made 
possible  by  the  string  manipulation  features  of  a  DIGITAL 
WS/78  Word  p-ocessor.  The  next  thing  that  must  be  done 
before  more  useful  work  can  be  done  in  this  area  is  to 
automate  the  technique.  This  automated  processor  should  be 
an  interactive  one  in  the  EL/PL  style. 

With  an  automatic  processor,  investigations  can  begin 
into  some  of  the  simple  program  constructs  which  we  have  not 
addressed.  Multiplication,  division,  ac^ditlon  and  subtrac¬ 
tion  of  variables  have  not  been  considered.  Since  these  are 
very  important  parts  of  many  algorithms,  this  work  must  be 
extended  to  cover  them  before  it  becomes  really  useful. 
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APPENDIX  A 


LINE-BY-LINE  ANALYSIS 
of 

"OBLIVIOUS"  INSERTION  SORT 

We  must  do  the  analysis  for  a  specific  class  of  initial 
distributions  for  the  problem  to  be  tractable.  Specifical¬ 
ly,  we  will  assume  that  each  element  of  B(1;N)  is  drawn 
independently  from  a  well  defined,  stationary  p.d.f.  f(bj). 
Therefore  the  initial  joint  p.d.f.  is  simply 

fg(b^,b2,b3, . =  f(bj)  •  f(b2)  •  *  *  f(b^). 

The  converted  program  is: 

1  procedure  INSERTION  SORT  (  B  ,  N  ) 


2 

real  B(1:N) 

3a 

J  <-  1 

3b 

call  OUTER (  J,  N-1,  B  ) 

10 

end 

INSERTION  SORT 

3c 

procedure  OUTER (  J,  LIM,  B  ) 

3d 

if  LIM  -  J  >  0  then 

4a 

I 

M 

4b 

call  INNER (  I,  B  ) 

9a 

J  J  +  1 

9b 

call  OUTER(  J,  LIM, 

9c 

end  if 

9d 

end 

OUTER 
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4c  procedure  INNER  (  I,  B  ) 

4d  if  I  1  then 

5  if  B(l)  >  B(I+1)  then 

6  EXCHANGE  (  B{I),  B(I+1)  ) 

7  end if 

8a  I  I  _  i 

8b  call  INNER  (  I,  B  ) 

8c  endif 

8d  end  INNER 

The  numbers  will  refer  to  the  statement  numbers  of  the 
recursive  version  of  the  algorithm. 

1  Initial  joint  p.d.f. 

fB(b^,b2,b3, . bj^)  =  fCb^)  •  f(b2)  •  •  *  f(bj^). 

3a  Adds  a  new  variable 

6(j  =  l)  •  f(b^)  •  f(b2)  *  *  •  f(bj^). 

3d  Splits  the  distribution  based  on  the  values  of  J  and 
LIM. 

In  the  true  branch: 

6(j<n)  •  6(j  =  l)  *  f(b^)  •  f(b2)  *  '  • 

In  the  false  branch: 

6(j>n)  •  6(j  =  l)  •  f(b^)  *  f(b2)  •  •  • 

We  have  made  the  substitutions  of  the  instances  of  the 
dummy  variables  in  the  routine.  Now,  if  N  =  1,  then 
the  true  branch  is  zero,  the  false  branch  reduces  to 
6(j=l)  *  f(b^),  and  we  are  done. 

4a  Adds  a  new  variable  in  the  true  branch 

6(i*j)  *6(j<n)  •6(j-l)  -f  (bj)  -f  (b2)  •••f  (bj^)  . 

This  joint  p.d.f.  is  transfered  with  the  call  at  4b. 
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4d  Splits  the  distribution  based  on  the  value  of  I. 

In  the  true  branch: 

^(i>l)  *6(i  =  j)  •6(j<n)  *6(j=l)  *f  (b^)  *f  (b2)  *"f  (bj,)  . 

In  the  false  branch: 

6(i<l)  •6(i  =  j)  •6(j<n)  *6(j=l)  -f  (b^)  -f  (b2)  •••f  (bjj)  . 

5  Finally  things  get  interesting!  This  is  the  first  test 
involving  the  data  itself.  This  statement  splits  the 
joint  p.d.f.  on  the  basis  of  the  values  of  B(I)  and 
B(I+1)  . 

In  the  true  branch: 

6(i>l)  *6(i  =  j)  •6(j<n)  *6(j=l)  *6(b^>b2)  'f  (b^)  'f  {b2)  *  ‘ ‘f  (b^,)  . 
In  the  false  branch: 

6(i>l)  •6(i  =  j)  •6(j<n)  •6(j  =  l)  •6(b2>b^)  'f  (b^)  ‘f  (b2)  ***f  (bjj)  . 

6  This  EXCHANGES  the  values  of  b2  and  b^ 

6(i>l)  •6(i  =  j)  •6(j<n)  •6(j  =  l)  ‘6(b2>b^)  'f  (b2)  -f  (b^)  ••*f  (bj^)  . 

7  At  the  join  for  the  if  statement  we  have 

6(i>l) •6(i=j) •6(j<n) •6(j=l) • 
{6(b2>b^)+6(b2>bj)}*f (b^) -f {b2) '‘'f (b^) . 

It  is  now  that  we  can  see  the  significance  of  our 
choice  of  initial  joint  p.d.f.  which  is  symmetric  with 
respect  to  the  exchange  of  variable  indicies. 

At  this  point  we  must  decide  whether  the  probability 
that  bj=bj  is  going  to  be  significant,  or  not.  If  we  choose 
to  deal  with  continuous  distributions,  then  this  probability 
is  zero.  Likewise,  if  we  say  that  the  discrete  elements  are 
distinct  we  have  the  same  thing.  We  will  do  this  so  that  we 


can  write  the  joined  joint  p.d.f.  as 

6(i>l) •6(i=j) •6(j<n) •6(j=l) •2-6(b2>b^) * 
f(b^)  *f(b2)  •••f(bjj) 

8a  This  adjusts  I  for  the  next  iteration 

6(i+l>l) *6(i+l=j) *6(j<n) •6(j=l) • 

2*6(b2>b^) -f (b^) *f(b2) •*•£ (b^) 

8b  We  know  from  step  4d  above,  that  this  joint  p.d.f.  will 
be  returned  with  the  additional  (superfluous) 
restriction  6(i<l).  Simplifying  we  have 
6(i=0)  •6(j<n)  •6(j  =  l)  •2*6(b2>b^)  -f  (b^)  *  f  (b2)  *  *  *  f  (b^^) 

This  joint  p.d.f.  is  returned  at  4b, 

9a  This  statement  adjusts  J  for  the  next  iteration,  and 
6(i=0)  *6(j-l<n)  •6(j-l  =  l)  •2*6(b2>b^)  ‘f  (b^)  ‘f  (b2)  *  ’ ‘f  (b^^) 
is  again  passed  to  OUTER. 

3d  We  see  now  that  this  test  "traps"  all  of  the  joint 

p.d.f,  in  the  loop  until  J  exceeds  LIM  (  N-1  in  our 
case  ).  So  we  won't  mention  the  false  branch  until  the 
end . 

In  the  true  branch; 

6(j<n) *6(1=0) *6(j-l<n) •6(j-l=l) * 

2*6(b2>bi)  'fCbj)  •f(b2)**’f(bjj) 

4a  This  collapses  the  old  joint  p.d.f.  on  i  and  results  in 

6(i  =  j)  •6(j<n)  •6(j=2)  •2*6(b2>bj)  ’f  (b^)  -  f  (b2)  ‘“f  (bj^) 

We  have  simplified  the  expression  with  respect  to  j. 

4d  This  joint  p.d.f.  arrives  at  INNER,  where  this 

statement  traps  the  joint  p.d.f.  until  I<1. 
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5 


In  the  true  branch: 


6(i  =  j)  •6(j<n)  •6( j=2) •2-6(b2>bj) •6(b2>b3) * 
f(b^)  •f(b2)  •••f(bj,) 

In  the  false  branch: 

6(i=j) •6(j<n) •6( j=2) *2*6(b2>bp •6(b3>b2) * 
f(b^) *f(b2) •••f(b^) 

6  The  exchange  yields: 

6(i=j) *6(j<n) •6(j=2) *2*6(b3>b3) •6(b3>b2) * 
fib^) -f (b2) •**f (b^) 

7  At  the  join  we  have: 

.6(i  =  j)  *6(j<n)  •6(j=2)  • 

2‘{6(b2>bj^)  •6(b3>b2)+6(b3>b^)  •6(b3>b2)  }* 
f  (b^)  •f(b2)  '’‘f  (bj^) 

8a  Prepares  for  the  next  call  of  INNER 

6(i=j-l) *6(j<n) •6( j=2) • 

2*{6(b2>b^) •6(b3>b2)+6(b3>b^) •6(b3>b2) }• 

f  (bj)  -f  (b2)  ---f  (bj^) 

This  gets  through  to  statement  5  in  INNER. 

5  In  the  true  branch (multiply  by  6(b3>b2)  and  simplify): 

6(i=j-l) •6{j<n) *6(j=2) • 

2*{6(bj>b2)  •6(b3>b^)  •6(b3>b2)  }*f  (b^)  -  f  (b2)  •••f  (bj,) 

In  the  false  branch  (multiply  by  6(b2>^bj^)  and  simplify) 
6(i  =  j-l) *6(j<n) •6(j=2)  • 

2*{6(b2>b^) •6(b3>b2)+6(b2>bj) •6(b3>b3) •6(b3>b2) }* 
f(b^)  •f(b2)  “‘fib^)  = 
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6(i=j-l) •6(j<n) •6{j=2) • 

2-{2-6(b3>b2) •6(b2>b^)}*f(b^)*f(b2) •**f(by) 

6  The  EXCHANGE  in  the  true  branch  yields; 

6(i  =  j-l) •6(j<n) *6(j=2)  • 

2-{6(b2>b3r6(b3>b2)  •6(b3>b^)}*f(b^)  •f(b2)  “'fCbj,) 

6(i»j-l) •6<j<n) *6{j=2) • 

2-{6(b3>b2)  •6(b2>b^)  }*f(b^)  -  f  (b2)  •••f  (bj,) 

7  At  the  join  we  have; 

6(i=j-l) •6(j<n) •6(j=2) * 

2M3-6(b3>b2)  •6(b2>b3)}*f(b3)  •f(b2)  ***f  (bj,) 

8a  Sets  I  to  zero  in  this  case,  and  the  next  call  of  INNER 
returns  this  joint  p.d.f. 

6(i=0) •6(j<n) *6(3=2) * 

2*{3*6(b3>b2)  •6(b2>b3)}*f(b^)  * f  (b2)  *  *  ‘ f  (b^j) 
to  OUTER  at  statement  9a, 

This  suggests  the  induction  hypothesis  that  if  you  give 
INNER,  at  its  call  from  OUTER,  the  distribution 

6(i=j) *6(j<n) *6(j=k) * 

kt‘6(b^>b^_p  •••6(b2>b^)  •f(bj)  -  f  (b2)  •••f  (bjj) 
it  returns  the  distribution 

6(i=0) •6(j<n) *6(j=k) * 

(k+1)  !*6(b^^^>b^)  •••6(b2>b3)*f(b^)  •f(b2)  •••f{bjj) 

This  can  be  shown  to  be  true  in  a  straight-forward,  if 
somewhat  tedious,  manner. 

OUTER'S  "loop-stopper"  releases  this  joint  p.d.f.  when 
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jsN  and  we  have  the  result: 

6(i=0)  •6(j-N)  *NI*6(by>bj,_^)  •••6(b2>b^)  * 

f(bp*f(b2)**'f(bj,) 

This  is  precisely  the  proper  answer  which  is  usually  derived 
using  combinatorial  arguments  [12] .  It  may  be  easier  to 
implement  this  method  of  analysis,  even  though  it  requires 
an  induction  proof  solver,  than  to  automate  the  rules  of 
combinatorial  arguments  and  proofs.  It  should  also  be  noted 
that  at  every  step  of  the  way  we  had  a  precise  expression 
for  the  performance  of  the  program.  The  marginal  p.d.f.  for 
any  program  variable  gives  the  probability  that  the  variable 
will  take  on  a  particular  value. 

Once  the  analysis  of  the  bare  algorithm  is  complete,  an 
analysis  for  any  particular  aspect  can  be  done  by  instru¬ 
menting  the  algorithm.  It  is  easy  to  show  that  this 

2 

algorithm  requires  exactly  [N  -N)  comparisons  between  the 

2 

elements,  which  is  twice  as  many  as  the  "improved"  version 
of  the  algorithm. 
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ALGORITHMIC  COMPLEXITY 
Part  5 


by 

Philip  J.  Janus 


ADAPTIVE  METHODS  FOR  UNKNOWN  DISTRIBUTIONS 
IN  DISTRIBUTIVE  PARTITIONING  SORTING 

ABSTRACT 


Distributive  Partitioning  Sorting  (DPS)  Is  a  new. 
Innovative,  practical  method  to  sort  a  set  of  Items  on  a 
computer.  This  method  has  been  shown  to  be  biased  toward 
uniformly  distributed  data,  performing  poorly  on  skewed 
distributions.  The  purpose  of  this  work  Is  to  find  adaptive 
methods  of  DPS  which  will  sort  any  unknown  distribution  equally 
well  and  remain  competitive  with  DPS. 

Two  adaptive  methods  were  developed  and  thoroughly  tested, 
the  Ranking  Method  and  the  Cumulative  Distribution  Function 
(CDF)  Method.  These  methods  transform  unknown  distributions 
Into  uniform  distributions,  and  then  perform  the  sorting. 
After  an  Implementation  of  OPS  was  benchmarked  against 
Quicksort,  experiments  were  run  on  four  distributions  (Uniform, 
Normal,  Poisson,  and  Exponential)  using  four  algorithms  (two 
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versions  of  DPS,  Ranking,  and  CDF).  Statistics  were  taken  to 
measure  the  efficiencies  and  run  times  of  the  algorithms.  The 
results  were  analyzed  against  theoretical  and  intuitive 
expectations  so  that  conclusions  could  be  reached  regarding  the 
performance  of  the  methods. 

It  was  found  that  if  it  is  known  in  advance  that  the  data 
distribution  will  typically  be  uniform,  normal,  or  slightly 
skewed,  then  it  is  advisable  to  use  DPS.  However,  if  it  is 

possible  the  data  distribution  might  be  very  skewed,  or 
extremely  large  or  small  data  values  exist  relative  to  the  rest 
of  the  data,  then  there  is  little  to  lose  and  much  to  gain  by 

using  the  CDF  adaptive  method.  CDFDPS  contained  only  a  2%  to 
overhead  to  DPS  in  the  uniform  case,  and  ran  up  to  1?* 

better  for  30,000  items  than  DPS  on  exponentially  distributed 
data.  The  ranking  method  was  found  to  contain  too  much 
overhead  to  be  competitive  with  DPS.  Suggestions  to  further 

improve  CDF  are  made,  and  future  implications  of  this  thesis 

work  are  discussed. 
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PREFACE 


A  large  percentage  of  data  processing  applications  is 
spent  sorting  data.  For  that  reason,  it  is  not  surprising  that 
sorting  is  the  most  widely  studied  problem  in  computer  science. 
The  faster  data  can  be  sorted,  the  more  computer  time  and  money 
can  be  saved. 

The  history  of  the  sorting  problem  is  long  and  interesting. 
As  expected  in  any  field  of  study,  once  an  algorithm  has  been 
developed,  somebody  tries  to  find  a  better  one.  This  is  true 
with  a  sorting  method  called  Quicksort,  developed  in  1962. 
Quicksort  had  been  shown  to  perform  fastest  on  most  machines 
once  some  modifications  were  made,  to  it. 

This  was  the  case  until  1978.  In  January  of  that  year,  a 
Polish  computer  scientist  named  Wlodzimierz  Dobosiewicz 
published  a  paper  in  Information  Processing  Letters  detailing  a 
sorting  algorithm  called  Distributive  Partitioning  Sorting,  or 
OPS.  This  new  sorting  method  was  shown  to  perform  much  better 
than  Quicksort.  Very  soon  afterward,  debate  began  as  to  its 
true  practicality  and  significance. 

And  as  could  be  expected,  people  started  looking  for  ways 
to  improve  it.  This  thesis  conducts  an  in  depth  look  at  OPS, 
and  the  various  problems  associated  with  it.  The  main  focus  of 
this  work  is  to  improve  the  overall  performance  of  the 
algorithm.  The  author  pleads  guilty  to  first  degree 
improvement . 
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CHAPTER  I 


DISCUSSION  OF  PRIOR  WORK 

I .1  Introducing  DPS 

In  1978,  a  new  sorting  method  called  Distributive 
Partitioning  Sorting,  or  DPS,  was  presented  to  the  world.  The 
algorithm  was  published  in  the  European  periodical  Informat  ion 
Processing  Letters  by  a  Polish  computer  science  student  named 
Wlodzimierz  Dobosiewicz  {pronounced  Vod- j im ' -y i ts  Do-bo ' -shev- i ts ) 
[D06078a].  The  article  detailed  a  fast,  practical,  0(n) 
sorting  algorithm  that  could  outperform  current  "fastest" 
methods.  The  results  of  the  paper  were  so  astounding  that 
Datamat i on  proclaimed  it  "the  first  real  innovation  in  (sorting) 
in  about  15  yearsl"  [DATA78].  Experimental  results  on  a  CDC 
computer  found  DPS  to  be  30  times  faster  for  5000  items  than 
its  nearest  competitor.  Quicksort.  It  has  also  been  shown  that 
this  factor  increases  as  the  number  of  items  increases.  The 
potential  for  saving  computer  time  and  money  using  DPS  is  great. 

In  Distributive  Partitioning  Sorting  it  can  be  shown  that 
if  the  data  is  uniformly  distributed,  the  expected  complexity 
of  the  algorithm  is  0(n);  that  is,  the  expected  running  time  of 
the  program  is  cn,  where  c  is  a  constant  that  is  multiplied  by 
n,  the  number  of  items  to  be  sorted.  The  drawback  with  DPS  is 
that  it  is  much  slower  In  the  worst  case.  Although  DPS  has  an 
0(n)  expected  case  complexity,  it  is  0(n  log  n)  in  the  worst 
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case.  OPS  approaches  the  worst  case  as  the  data  distribution 

■i 


becomes  more 

and 

more 

skewed,  such  as 

with 

a  series 

of 

factorial s . 

The 

purpose 

of  this  work  is 

to 

show  that 

the 

performance  of  DPS  can  be  improved  for  unknown  distributions. 
The  algorithm  would  then  be  guaranteed  to  outperform  its 
competitors  for  any  input  distribution. 

I.?  Definition  of  Sorting 

First  it  is  necessary  to  define  sorting,  and  the 
limitations  involved  with  it.  Sorting,  as  the  word  implies,  is 
the  arranging  of  a  set  of  data  into  some  prescribed  order.  In 
his  famous  book.  Searching  and  Sorting  [KNUT73],  Knuth 
rigorously  defines  sorting  in  the  following  manner; 

Suppose  n  items  are  given 

*  ^2  ’  ’  *  *  ’  ^n 

called  records,  to  be  sorted  in  either  ascending  or  descending 

order.  The  records  collectively  are  called  a  file.  Each 

record,  R.,  has  a  piece  of  information  called  a  key  field, 
J 

K.,  on  which  the  record  is  to  be  sorted. 

J 

A  linear  ordering  is  defined  on  the  keys  with  two 
relational  laws.  Given  three  keys  a,  b,  and  c: 
i)  Law  of  Trichotomy  — 

Dne  of  either 

a<b  a»b  a>b 
must  be  true. 
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ii)  Law  of  Transitivity  — 

If  a  <  b  and  b  <  c 
then  a  <  c. 

Governed  by  this  linear  ordering,  the  goal  of  sorting  is  to 
rearrange  the  keys  into  a  permutation 

p(l),  p{?),  ...  ,  p(n) 

such  that 

^{l)  ^  %{?)  ^  •••  ^  %{n) 

The  analysis  of  the  various  sorting  algorithms  in  this 
paper  will  be  concerned  with  a  number  of  criteria  on  which  a 
method's  performance  may  be  judged.  An  algorithm  should  be 
shown  to  work  correctly  for  all  types  of  expected  inputs.  The 
amount  of  work  done  and  the  amount  of  storage  used  should  also 
be  considered.  Equally  important  is  whether  the  resulting 
program  is  simple  and  lends  itself  to  being  easily  understood, 
modified,  and  debugged.  Lastly  it  should  be  seen  if  the  method 
is  optimal;  that  is,  if  another  method  exists  which  does  less 
work  or  uses  less  space. 

For  example,  consider  the  optimality  of  the  sorting 


problem. 

We 

would  like 

to  know  how 

many 

comparisons  are 

necessary 

to 

sort  a  set 

of  n  items 

i  n  a 

comparison-based 

sorting  method.  This  means  establishing  a  lower  bound  for  such 
methods.  For  any  comparison  method,  a  comparison  tree  can  be 
constructed . 

Figure  I.l  shows  a  comparison  tree  for  three  items.  Each 
internal  node  represents  a  comparison  and  the  lowest  level 
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Figure  I.l 
Comparison  Tree 
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nodes  (leaves)  show  the  possible  outcomes.  Note  that  if  we  are 
given  n  items  to  sort,  there  are  n  factorial  (nl)  possible 
outcomes.  Hence  3!,  or  6,  leaves  are  in  the  example  tree. 

Also  note  that  in  any  binary  tree  of  height  k  levels, 
there  are  at  most  ?  leaves.  In  comparison  trees,  the  leaves 
are  the  possible  outcomes  of  the  sort.  So: 

>  nl  (1) 

Let  C(n)  be  the  minimum  number  of  comparisons  necessary  in 
the  worst  case.  This  corresponds  to  a  path  that  is  followed 
down  to  an  outcome,  which  is  just  the  height  of  the  tree.  So: 


and  then 

k  =  C(n ) 

(?) 

2^^"^  >  nl 

(3) 

Taking  the  log^  of 

both  sides 

C(n)  >  log^nl 

(4) 

Approximating  this 

using  Stirling's  formula 

gives 

log  n  1 

»nlogn-n/ln?+l/? 

log  n  +  a  (5) 

This  shows  a  lower  bound  of  (n  log  n)  on  C(n). 


This  means  that  no  comparison  based  method  will  work  in 
less  than  (n  log  n)  comparisons,  and  that  any  comparison  based 
method  attaining  (n  log  n)  comparisons  is  considered  optimal. 


Unless  otherwise  stated,  all  logs  will  imply  log?. 
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Distributive  Partitioning  Sorting  is  a  union  of  two  classes 
of  sorts;  partition-exchange  sorts  and  distributive  bucket 
sorts.  These  classes  of  sorts  will  now  be  discussed,  and  then 
DPS  will  be  presented. 

1.3  Partition-Exchange  Sorting 

A  number  of  sorting  algorithms  use  an  approach  known  as 

Exchange  Sorting.  This  class  of  methods  uses  the  idea  that  if 

two  keys  are  found  to  be  out  of  order,  then  the  records  are 

exchanged.  The  position  in  the  file  of  the  elements  being 

exchanged  can  also  be  thought  of  as  being  swapped  or 

interchanged.  The  exchanging  continues  until  no  more  pairs  are 

found  to  be  out  of  order  and  the  entire  file  is  sorted. 

One  such  exchange  sort  is  called  Quicksort.  This 

algorithm  was  first  presented  by  C.A.R.  Hoare  in  196?  in  a  very 

detailed  paper,  and  has  been  the  subject  of  very  close  study 

[H0AR6?].  Briefly  Quicksort  works  as  follows: 

Given  an  array  A^,  A^ ,  ...  ,  A^  to  be  sorted 

Partition:  Position  some  key,  Aj,  in  its  final 

position,  so  that  the  file  is  divided  into  two  parts . 

Aj  is  the  partitioning  element.  All  the  items  in  the 

left  subfile  A^,  A^,  ...  ,  Aj_j  are  less  than 

Aj,  and  all  the  items  in  the  right  subfile 

...  ,  A  are  greater  than  A., 
n  j 

Recurse:  Now  the  problem  reduces  to  Quicksorting  the 

two  resulting  subfiles  until  the  subfiles  have  one 
element  left. 
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Algorithm  Quicksort  is  presented  in  Figure  1.2. 

It  is  fitting  here  to  walk  through  an  example  of 
QUICKSORT.  Suppose  the  elements  to  be  sorted  are  those 
frequently  used  by  Knuth. 

503  87  512  61  908  170  897  275  653  426  154  509  612  677  765  703 
left  a  1  right  =  16 
On  the  first  pass  through  QUICKSORT 
part  =  503 

503  87  512  61  908  170  897  275  653  426  154  509  612  677  765  703 
i - exchange - j 

503  87  154  61  908  170  897  275  653  426  512  509  612  677  765  703 
i - exchange - j 

503  87  154  61  426  170  897  275  653  908  512  509  612  677  765  703 


503  87  154  61  426  170  275  897  653  908  512  509  612  677  765  703 
left - j  i 

At  this  point  the  final  position  of  the  partition  element, 
503,  has  been  found. 

275  87  154  61  426  170  503  897  653  908  512  509  61?  677  765  703 
- QUICKSORT -  - QUICKSORT - 

Now  QUICKSORT  is  performed  recursively  on  the  two  resulting 
subfiles.  One  more  pass  on  the  smaller  subfile  will  be 
illustrated. 

part  a  275 

275  87  154  61  426  170 
i— j 


275  87  154  61  170  426 
left - j  i 

170  87  154  61  275  426 
—QUICKSORT—  -Q- 


Note  that  275,426,  and  503  are  now  sorted. 
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Figure  I.? 

Algorithm  Quicksort 

The  swap  operator  is  used,  where  a  b  means  swap  the 

values  of  a  and  b. 

1)  procedure  QUICKSORT  (left, right) 

II - 

II  Sort  Array  A  bounded  by  A(left)  to  A(right) 

II - - - 

?)  i nteqer  left,  right 

3)  real  array  A 

II - 

II  Partition 

II - 

4 )  j_f  right  >  left 

5)  then  do 

6)  i  -  left 

7)  j  -  right+1 

8)  part  «  A(left)  //partition  element 

II . - . . . . . 

II  Burn  the  candle  at  both  ends  until 

II  the  position  of  the  partition  element  is  found 

II - - - - - 

9)  ^  until  j  <  i 

10)  ^  i  •  i+1  while  A(i)  <  part  &  i  <  right 

11)  j  ■  J-1  while  A(j)  >  part  A  j  >  left 

1 ? )  il  j  >  i  then  A ( i )  : » ;  A ( j ) 

13 )  end 

14)  A(left)  A(j) 

II - - — 

II  Recurse  on  the  two  subfiles 

II - - 

15)  QUICKSORT! left,J-l) 

16)  QUICKSORT!!, right) 

17)  ^ 

18)  ^  QUICKSORT 
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This  divide  and  conquer  approach  to  sorting  achieves 
the  desired  result  fairly  rapidly.  In  fact.  Quicksort  is  by 
far  the  fastest  sorting  method  available  for  implementation 
on  most  computers  [SEDG78].  This  suggests  that  an  analysis 
of  the  running  time  of  Quicksort  is  appropriate  to  explain 
why  this  is  so. 

The  worst  case  for  Quicksort  is  when  the  file  is 

already  sorted .  This  is  because  the  partitioning  element 
being  chosen  each  time  results  with  that  element  in  one 
subfile  and  the  rest  of  the  elements  in  the  other  subfile. 

But  suppose  a  good  choice  is  made  of  a  partitioning  element 

so  that  half  go  to  one  subfile  and  half  go  to  the  other.  In 
each  successive  pass,  the  algorithm  has  two  subfiles  of  n/? 
elements  to  sort.  Since  the  amount  of  time  spent  to  find 

the  position  of  the  partition  element  is  0{n),  the  number  of 
comparisons  to  Quicksort  is  then 

C(n)  <  n  +  2C{n/2)  (1) 

Solving  the  recurrence  relation 

C(n)  <  n  +  2{nl2  *  ?C(n/4))  at  nl2 

<2n*  4C(n/4)  (?) 

<  ?n  +  4(n/4  +  ?C(n/8))  at  n/4 

<  3n  +  8C(n/8)  (3) 

• 

C(n)  <  kn  +  C("/?k)  (4) 
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The  time  to  sort  one  element  is  zero 

C(l)  -  0  (5) 

k  k 

so  we  want  to  stop  at  n/2  »  1  or  ?  *  n  or 

k  =  log  n  (6 ) 

Substituting  (5)  and  (6)  back  into  (4) 

C(n)  <  n  log  n  +  2^°^  f’  C(l) 

=  0(n  log  n) 

So  in  a  good  case  Quicksort  is  0(n  log  n).  A  rigorous  analysis 
of  the  expected  case  shows  that  Quicksort  is  also  0(n  log  n) 
[BAAS78].  As  it  turns  out.  Quicksort  performs  faster  than  any 
other  sort  on  most  machines.  This  is  due  to  a  low  constant  of 
proportionality  in  the  0(n  log  n).  Bu;*  it  is  also  due  to  some 
modifications  that  ran  be  made  to  the  original  algorithm.  With 
these  changes  Quicksort  has  evolved  to  its  present  day  accepted 
implementation  [SEDGTB]. 

The  first  modification  pertains  to  Quicksort's  worst  case 
problems.  Recall  that  the  worst  case  is  when  the  file  is 
already  sorted.  This  paradox  arises  because  on  each  pass  the 
partitioning  element  divides  the  file  into  one  subfile  with  one 
element  and  a  second  subfile  with  n-1  elements.  This  yields  the 
number  of  comparisons 

n 

C(n)  .  Z  (k-1)  -  n(n-l)/? 
k-2 

which  is  O(n^). 
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The  way  to  Improve  this  worst  case 
partitioning  element  on  each  pass.  As 
would  be  nice  if  the  partition  divided 
is,  if  it  is  the  median  element.  Findi 
time  consuming,  although  it  can  be  done 
in  the  expected  case  [FL0Y75].  A 
acceptable  partitioning  element  is  to 
three"  elements: 


is  by  choosing  a  better 
it  has  been  seen,  it 
the  file  in  half,  that 
ng  the  exact  median  is 
using  1.5n  comparisons 
quick  way  to  get  an 
choose  the  "median  of 


A(left)  A({left  +  right)/?)  A(right) 
where  left  and  right  are  the  upper  and  lower  bounds  of  the 
array  A.  It  has  been  found  that  this  modification  not  only 
improves  the  worst  case  significantly,  but  also  improves  the 
average  case  by  5$. 

Another  problem  with  Quicksort  is  in  sorting  small 
subfiles.  The  algorithm  spends  much  time  partitioning, 
comparing,  and  exchanging  elements.  It  seems  worthwhile  to  use 
some  other  sorting  method  which  is  more  efficient  on  these 
smaller  files.  The  idea  is  to  stop  Quicksorting  any  subfile 
with  less  than  some  number  of  elements.  The  resulting  file 
then  contains  a  series  of  unsorted  subfiles  in  their  relative 
correct  order.  An  Insert ionsort  (a  simple  sorting  method)  then 
completes  the  sorting  by  ordering  these  small  individual  groups 
as  it  scans  through  the  file.  It  has  been  shown  that  there  is 
at  least  a  \S%  savings  when  Insertionsort  is  used  on  subfiles 
with  at  most  9  items  in  them. 
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The  last  major  problem  with  Quicksort  is  that  it  is 
recursive.  Recursive  procedures  run  very  slow  on  most 
computers.  Recursion  can  be  removed  by  pushing  and  popping  the 
left  and  right  endpoints  of  the  subfiles  to  be  sorted  on  a 
Last-In-First-Out  stack.  If  the  smaller  subfile  is  sorted 
first,  it  can  be  shown  that  the  maximum  stack  depth  never 
exceeds 

log  (n+l)/(M+?)  where  M  is  the  small  file  cutoff. 

If  Ms9  and  a  maximum  depth  of  20  is  assumed,  then  Quicksort  can 
handle  up  to  10,000,000  elementsl  And  the  sorting  is  done  with 
very  little  extra  storage. 

With  these  three  modifications.  Quicksort's  run  time  can 
be  improved  by  a  factor  of  2055  over  Hoa're's  original 
algorithm.  This  version  has  been  implemented  by  R.  Sedgewick 
and  verified.  The  Sedgewick  implementation  of  Quicksort  will 
be  used  in  this  thesis  work  as  a  benchmark  for  DPS. 

Quicksort  still  is  not  free  of  its  problems  however.  The 
algorithm  has  an  unstable  property,  that  is,  records  with  equal 
keys  do  not  maintain  their  relative  order.  In  most 
applications  this  is  not  a  crucial  factor,  but  it  is  an 
important  consideration  when  implementing  Quicksort.  Although 
the  medi an-of-three  modification  is  made  to  improve  the  worst 
case,  it  is  nonetheless  unfortunate  that  the  worst  case  is 
still  0{n^). 
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Quicksort's  disadvantages  are  far  outweighed  by  Its 
advantages.  Aside  from  being  practical  -and  extremely  fast.  It 
uses  relatively  little  extra  space.  With  a  day  or  two  of 
effort,  a  working  version  of  Quicksort  can  be  Implemented.  All 
things  considered.  It  Is  hard  to  beat.  The  reader  Is  referred 
to  [AH074,  BAAS78,  600077,  H0RO76,  H0R078,  KNUT73,  FRAZ70, 
GRIF70,  H0AR6?,  L0ES74,  SE0G78,  SIN669]  for  histories  and 
discussions  of  Quicksort. 


1.4  Bucket  Sorting 

Depending  on  the  reference,  this  class  of  sorts  Is  called 
Bucket,  Distributive,  Radix,  or  List  Sorting  [AH074,  BAAS78, 
G00D77,  H0R078,  KNUT73],  but  the  Idea  Is  all  the  same. 
Elements  are  distributed  according  to  the  value  of  their  keys 
Into  buckets.  Each  bucket  will  be  assigned  those  elements  In  a 
predefined  range.  Each  bucket  Is  then  ordered  by  bucket 
sorting  recursively  or  by  some  other  sorting  method.  Then  the 
Items  In  the  buckets  are  linked  together  to  produce  the  final 
sorted  sequence. 

For  example,  here  are  Knuth's  numbers  again. 

503  87  512  61  908  170  897  275  653  426  154  509  612  677  765  703 
Suppose  4  buckets  are  created  where  the  range  0-999  Is 
evenly  divided  such  that: 
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Bucket  # 


Values 


1  0-249 

2  250-499 

3  500-749 

4  750-999 


Then  on  the  first  pass 

Bucket  Heads  Links 

1  -  87  —  61  —  170  —  154  —  nil 

2  -  275  —  426  —  nil 

3  -  503  —  512  —  653  —  509  — 

612  —  677  —  703  —  nil 

4  -  908  —  897  —  765  —  nil 

Recursively  sorting  bucket  #1.  4  new  buckets  are  created 

Bucket  *  Values 

1.1  0-62.5 

1.2  62.5-125 

1.3  125-187.5 

1.4  187.5-250 
Upon  distributing 

Bucket  Heads  Links 

1.1  - 61  —  nil 

1.2  - 87  —  nil 

1  .3  -  170  —  154  —  nil 

1.4 - nil 


Notice  that  61  and  87  are  now  sorted.  Continuing  to  sort  these 
smaller  buckets  and  appending  them  to  one  another  will  yield  a 
sorted  set  of  items.  Knuth  calls  this  method  Multiple  List 
Insertion  (MLI). 
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Another  variation  of  this  method  is  to  create  ten  buckets 
and  scan  the  key's  digits  from  the  least  significant  digit  to 
most  significant  digit  {right  to  left)  dropping  them  into  the 
appropriate  buckets  as  we  go  along.  This  is  highly  known  as 
the  ISO  Radix  Sort  method.  For  example,  here  are  Knuth's 
numbers : 

503  87  51?  61  908  170  897  275  653  ‘26  154  509  612  677  765  703 
For  each  successive  pass  after  the  first,  buckets  0  through  9 
are  LSD  sorted  until  all  three  significant  decimal  places  have 
been  scanned,  as  in  Figure  1.3. 

The  time  to  perform  this  sort  is  just  the  number  of 
significant  places,  d,  times  the  number  of  items.  So  this 
algorithm  takes  dn  passes  and  is  0(n). 

Karp  has  shown  that  Knuth's  Multiple  List  Insertion  is 
also  0(n)  if  the  distribution  function  of  the  data  is 
well-behaved  [KNUT73 ,p.l05] .  Without  going  into  the  details, 
Knuth  also  shows  that  the  best  case  for  MLI  is  when  the  number 
of  buckets,  M,  is  equal  to  the  number  of  items,  n,  where  the 
items  to  be  sorted  are  uniformly  distributed.  These  concerns 
will  reappear  for  Distributive  Partitioning  Sorting. 
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Figure  1.3 
LSO  Radix  Sorting 


Bucket  Head  Pass  1  (ones  place) 

0 - 170  —  nil 

1  - 61  —  nil 

2  - - - 51? - 61?  — -  nil 

3  -  503  —  653  —  703  —  nil 

4  - 154  nil 

5  - ?75  —765  —  nil 

6  - 4?6  —  nil 

7  -  87  —  897  —  677  —  nil 

8  -  908  - nil 

9  -  509  —  nil 

Pass  ?  (tens  place) 

0  -  503  —  703  —  908  —  509  nil 

1 . 51?  —  61? - nil 

? . —  4?6  —  nil 

3  . —  nil 

4  - nil 

5  -  653  -  154  —  nil 

6  -  61  —  765  —  nil 

7  -  170  -  275  -  677  - nil 

8  - 87  —  nil 

9  -  897  - nil 

Pass  3  (hundredths  place) 

0 - 61 - 87 - nil 

1  -  154  —  170  —  nil 

2  -  275  - nil 

3  - nil 

4  -  426  - nil 

5  -  503  —  509  —  51?  nil 

6  - 61? -  653  —  677  —  nil 

7  -  703  -  765  —  nil 

8  -  897  —  nil 

9  -  908  —  nil 
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It  should  be  noted  that  although  this  method  is  both 
theoretically  and  practically  faster  than  Quicksort,  it  uses 


( 


much  more  storage. 

Recal  1 

that 

Quicksort 

only 

needed 

a 

fractional  amount  of 

extra 

space 

to 

maintain  a 

stack 

to 

eliminate  recursion. 

Bucket 

sorts  need 

lots  of 

extra 

space 

to 

maintain  the  buckets. 

The 

amount 

of 

extra 

space 

will 

be 

proportional  to  the 

number 

of  buckets 

plus 

the  number 

of 

items.  Most  implementations  use  extra  space  for  bucket  heads 
to  show  which  item  is  first  in  each  bucket,  and  for  item  links 


to  show  which  items  are  in  each  bucket.  The  worst  case  for 
bucket  sorts  is  when  all  but  one  item  goes  into  one  bucket  on 
each  pass.  This  leads  to  an  amount  of  work: 

n 

Z  i  =  n(n*l ) 
i  =1  2 

which  is  O(n^). 

In  discussing  bucket  sorts,  Baase  suggests  [BAAS78]: 

"Thus  in  the  worst  case  a  bucket  sort  would  be  very 
inefficient.  If  the  distribution  of  the  keys  is  known  in 
advance,  the  range  of  keys  to  go  into  each  bucket  can  be 

adjusted  so  that  all  buckets  receive  an  approximately  equal 
number  of  keys." 

Before  introducing  LSO  sorting,  she  says: 

"The  reader  might  wonder  why  we  don’t  use  a  bucket  sort 
algorithm  recursively  to  create  smaller  and  smaller  buckets. 
There  are  several  reasons.  The  bookkeeping  would  quickly  get 
out  of  hand;  pointers  indicating  where  various  buckets  begin 
and  information  needed  to  recombine  the  keys  into  one  file 

would  have  to  be  stacked  and  unstacked  often.  Due  to  the 

amount  of  bookkeeping  necessary  for  each  recursive  call,  t;-e 
algorithm  should  not  count  on  ultimately  having  only  one  key 

per  bucket,  so  another  sorting  algorithm  will  be  used  anyway  to 
sort  small  buckets.  Thus  if  a  fairly  large  number  of  buckets 
is  used  in  the  first  place,  there  is  little  to  gain  and  a  lot 
to  lose  by  bucket  sorting  recursively." 
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These  are  precisely  the  issues  this  thesis  will  address 
with  regard  to  Distributive  Partitioning  Sorting.  (It  is 
interesting  to  note  these  comments  were  published  in  the  same 
year  as  DPS.) 


1.5  Distributive  Partitioning  Sorting 
The  previous  sections  were  intended  to  give  the  reader 
enough  background  to  fully  understand  and  appreciate  the 
advantages  and  disadvantages  of  the  sorting  algorithm  about  to 
be  presented. 

On  the  one  hand,  OPS  is  an  extension  of  Quicksort.  Given 
n  items,  instead  of  two  partitions  being  created  on  each  pass, 
n  partitions  are  created.  But  the  similarity  to  Quicksort 
stops  there,  because  DPS  is  not  a  comparison-based  sort. 
Rather,  it  is  a  distributive  based  sort  more  resembling  Knuth's 
Multiple  List  Insertion,  where  the  number  of  buckets  created 
equals  the  number  of  items  being  sorted. 

Basically  what  is  done  is  as  follows; 

Given  n  items  to  be  sorted. 

Select:  Find  the  maximum,  minimum,  and  median  (middle)  ele¬ 
ments,  all  of  which  can  be  found  in  0(n)  time. 

Partition:  Using  these  values,  divide  the  range  of  the  data 

between  the  maximum  and  minimum  into  n  (buckets)  with  n/? 
equal  intervals  on  one  side  of  the  median,  and  another  n/? 
equal  intervals  on  the  other  side. 

Distribute;  For  each  item,  determine  which  of  the  intervals 


it  falls  into. 


Recurse:  For  each  interval  with  more  than  one  element  in  it, 

sort  the  bucket  using  OPS. 

The  algorithm  as  originally  published  by  Dobosiewicz  in  a 
pseudo-ALGOL  language  now  follows  in  Figure  1.4. 

In  short,  here  is  what  procedure  SORT  does.  The  items  to 
be  sorted  are  stored  in  array  A.  Line  11  creates  a  linked  list 
*  with  the  i-th  item  pointing  to  the  (i+l)st  in  the  list.  Tne 
list  end  is  designated  by  S(m)«0.  The  first  half  of  the  array 
is  sorted  by  calling  LSORT (1,0, m).  The  array  is  reordered  by 
procedure  MACLAREN  which  looks  at  the  pointers  and  swaps  the 
elements  around  to  produce  a  sorted  array  A  [KNUT73 ,p.596] . 
Lines  15  to  18  then  complete  the  sorting  on  the  second  half  of 
the  array. 

Procedure  LSORT  does  the  actual  sorting.  The  array  L  of 
list  heads  is  initialized.  This  array  points  to  the  elements 
at  the  top  of  the  buckets  being  created.  Each  of  these  buckets 
can  be  thought  of  as  being  a  Last-In-First-Out  (LIFO)  list.  As 
an  element  is  found  to  belong  to  a  particular  bucket,  it  is 
considered  to  be  the  new  list  head.  Its  pointer  value  is  put 
into  the'.L  array  and  the  S  array  is  consequently  updated. 

Specifically,  LSORT  works  as  follows.  The  bucket  of  size 
n  headed  by  pointer  'link'  will  be  sorted.  Step  1  is  the 
initialization.  Line  ?6  sets  up  the  list  heads  for  n  buckets 
by  initializing  them  to  zero.  Next  the  maximum,  minimum,  and 
median  elements  are  determined.  Lines  ?8  to  31  check  for  the 
case  where  all  items  are  equal  and  considers  them  sorted. 
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Figure  1.4 
Algorithm  SORT 

1)  procedure  S0RT(A,n) 

2 )  integer ~n 

3 )  array  K 

4 )  beg  1 n 

real  min,  max,  median 

5 )  integer  m 

II  the  declaration  of  LSORT  should  be  here  II 
preparatorypass : 

6)  FINDMINMAXMEO(A,n) 

II  FINOMINMAXMEO  finds  the  smallest,  largest,  and 
median  elements  of  an  array  A  of  length  n.  These 
elements  are  stored  in  min,  max,  median  respec¬ 
tively.  The  array  A  is  partitioned  by  the  median 
selection  algorithm  in  2  halves:  A(l:fn/?1  )  and 
A(  fn/?!  +1 :n) .  The  Kivest-Tar jan  selection  algorithm 
is  suitable  for  use  here  II 

7)  m  :»(n+l)/? 

8)  begin 

9)  integer  array  S(l:m) 

10)  integer  i 

initial ize  1st: 

11)  for  i  :»  1  step  1  unt i 1  m-1  ^  S(i)  :-  i+1 

1?)  ITm)  :-  0 

sorting: 

13)  LS0RT(l,0,m) 

II  LSORT  does  the  sorting.  The  array  S  will 

contain  a  list  of  pointers  showing  the  correct 
order  of  elements  II 

14)  MACLAREN(l.n) 

II  to  complete  sorting,  it  is  necessary  to  reorder 
the  input  vector  A.  An  algorithm  due  to  H.D. 
Maclaren  is  doing  it  in  linear  time.  Any  other 
method  could  be  used  II 


now  sorting  of  the  2nd  half: 
initialize  2nd: 


15) 

16) 


17) 


18) 


19) 


20) 


for  i  :-l  step  1  unt i  1  n-m-1  do  S( i ) 
S( n )  : -0 
LS0RT(1 ,m,n-m) 

MACLAREN(m*l ,n) 

end 

end  SORT 


i+m+1 
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?1)  procedure  LS0RT(1 ink,incr,n) 

22)  integer  link,incr,n 

II  LSORT  performs  a  list  sort  on  the  elements  of  array  A 
pointed  at  by  a  list  stored  in  array  S.  The  value  of 
link  gives  the  head  of  the  list  and  n  is  the  number 
of  elements  of  the  list.  The  parameter  incr  is  a  bit 
tricky:  it  is  used  to  distinguish  between  first  and 

second  halves  of  the  array  A.  Why  is  it  used? 
Because,  in  order  to  save  storage,  there  is  a  one  to 
one  corrfespondance  between  S  and  the  current  half  of 
A:  if  S(i)  is  in  the  list,  it  mp.?ns  that  A(i  +  incr) 

is  to  be  sorted  in  this  pass  // 

?3 )  begin 

24)  integer  array  L(l:n) 

2 5 )  integer  length, i,j,p 

II  L  is  used  to  store  pointers  to  nonempty  lists  (kept 
in  S)  II 

2 6 )  for  i  : =  1  step  1  unt  i  1  n  ^  L ( i )  : *  0 

step  1: 

27)  LFINOMINMAXMEOd  ink,n) 

II  selects  the  smallest,  largest,  and  median  of  medians 

elements  of  list  starting  with  S(link)  II 

28)  _i_f  min  =  max  then 

29 )  begin 

30)  for  i  :=  link,  S(i)  while  i  >  0  ^  APPEND  (i,incr) 

31)  end  //  this  was  in  case  of  identical  items  II 

32 )  else 

step  2: 

3 3 )  for  i  : ■  link,  p  while  i  >  0  do 

34)  begin 

35)  p  S(i) 

36)  j  j_f  A(i  +  incr)  <  median  then  ...else  ... 

II  a  complex  expression  finding  to  which  group  does  the 

item  A(i+incr)  belong  // 

37)  S(i)  L(j) 

38)  L(j)  i 


II  item  A(i+incr)  is  put  on  top  of  the  LIFO  list  // 


step  3: 


40)  for  j  :»  1  step  1  unt  1 1  n  ^ 

4 1 )  if  L { j )  >  0  then 

4? )  begin 

43)  length  ; «  -1 

44)  for  i  :=  L(j),  S(i)  while  1  >  0  do 
length  :■  length  +  1 

II  compute  the  length  of  j-th  list. 

If  more  than  1  element,  call  LSORT  again,  otherwise 
append  at  the  end  of  sorted  part  of  the  vector  (a 
list  kept  in  S).  In  actual  program  the  array  L 
should  be  compacted  first:  empty  groups  should  be 
deleted.  This  ensures  that  a  total  number  of 
pointers  will  never  exceed  the  number  of  items  II 

45)  j_f  length  >  1  then  LS0RT(  L  (  j  ) ,  i  ncr  ,  1  ength  ) 

46)  else  APPEND! L ( j ), incr ) 

47)  end 

48)  endTIORT 
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Step  2  is  the  heart  of  the  sorting.  Line  33  is  a  loop 
which  will  scan  down  the  elements  of  the  bucket  headed  by  the 
pointer  'link'.  Line  35  saves  the  value  of  the  pointer  to  the 
next  element.  The  bucket  value  for  that  item  is  then 
calculated  using  "a  complex  expression"  to  be  discussed  in 
detail  later.  Line  37  shifts  the  LIFO  list  for  that  bucket, 
and  line  38  puts  that  item  at  the  head  of  the  list. 

Step  3  is  the  recursive  step.  All  n  buckets  are  scanned 
to  see  which  ones  need  to  be  sorted  still  further.  Line  44 
determines  the  size  of  the  i-th  bucket.  Lines  45  and  46  will 
call  LSORT  if  there  are  still  items  to  be  sorted,  otherwise  the 
sorted  item  will  be  appended  to  the  output  array. 

To  avoid  any  further  confusion,  an  example  follows: 

Given  Knuth's  numbers 


1  1 

? 

3 

4 

5 

6 

7 

8 

9 

10 

11 

111] 

13 

14 

15 

16 

"ST 

5l? 

TT 

TTTT 

?75 

TO" 

TTO 

61? 

677 

765 

and  that  Dobosiewicz 's  suggested  partition  formula  is  used: 
if  X  <  medlar,  then  J  -[^edlan  !^min  •  j 

ii  X  >  median  then  j  ^ 

For  reasons  to  be  explained,  this  example  will  not  sort  one  half 
first  and  then  the  other  half,  but  rather  will  sort  everything 
all  together. 

For  the  first  call  of  LSORT: 
min  >  61 

max  >  908 

med:  |Tn+l)/T|  -  9th  ,  so  med-51? 
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Array  S  is  a  set  of  pointers  showing  which  element  is  next  in 
any  given  bucket.  L  is  an  array  of  pointers  showing  which 
element  is  at  the  head  of  the  i-th  bucket.  Figure  1.5 
demonstrates  the  sorting  procedure. 

Items  in  a  bucket  are  found  by  the  algorithm 

for  i  :«  1  ^  n  //  For  each  bucket 
p:»L{i)  //List  Head 

do  until  p  =  0 
print  A(p) 

P  :=S(p)  //  Next  Pointer 

end 

end 

The  size  of  a  bucket  can  be  found  in  a  similar  manner.  By 
recursively  sorting  each  bucket  of  size  greater  than  one,  the 
sorting  process  will  be  complete. 

It  is  easy  to  see  that  in  the  best  case  this  algorithm  is 
0(n).  For  perfectly  equally  spaced  data,  each  bucket  would 
contain  one  item  after  the  first  pass,  and  everything  is 
sorted.  Although  it  seems  intuitive  that  this  algorithm  is 
0(n)  In  the  average  case,  rigorously  showing  so  is  difficult. 
In  his  original  paper,  Oobosiewicz  shows  that  the  algorithm  is 
0(n)  in  the  expected  case  for  uniform  distributions.  The 
reader  is  referred  to  this  paper  for  details  of  the  derivation 
[D0B078a]. 
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Figure  1.5 

Example  of  Distributive  Partitioning  Sorting 


i  i  1  ?  3  4  5  6  7  8  9  10  11  1?  13  14 

A(i)  S" 503087?I?06l90817O897?75653i7?IT??U961?677 

j  S?3456789  10  llT?  13  1413 

S(i)  =  L(j)  L  0  “0  0“  0  0  0  0  0  0  0  “S  (5  0"  0 

L( j )=i 


i=l 

A(l)=503 


S  I  0  3  4  5  6  7  8  9  10  11  1?  13  14  1 

00000010000000 


A(?)=87 


0  I  0  I  4  I  5  6  7  8  9  I  id  11  1?  131  141  151  161  0 


0000010000000 


L(l)-?  Bkt^ 

00 

1 

1 

1 

1  603 

1 

i-3  S 

A{3)-512  — 

i 

1  0 

J _ 

1 

1  0 

J _ 

0 

5 

6 

7  8 

9 

j=8  L 

SI  1 l_n  • 

?  1  0 

y 

0 

0 

0  1 

3 

111  l?l  131  141  151  161  0 


0  0 


0  0 


And  so  on  until  all  the  Items  have  been  scanned  and  sorted. 

i-16  i  I  1  I  ?  I  3  I  4  I  5  I  6  I  7  I  8  I  9  I  lol  111  l?l  13l  14l  15l  16 

A(16)-703  A  5O3O87Bl?p6lboai7Ob97g75B53U?6l54B09tel?k77p6ay^ 

j-13  T~n  o“ir“?“"o  irT"!!  o~irT~n~~D  Firir 

s(i6)-o  L  "4  nir  8  u  njT7"3  0  0  nniTSTSu  r 

L{13).16  Bktl  61I154I — w — |4?6|50$|si?i — — |gl?jS77t703|76S| — ^ 

08 


Now  consider  the  worst  case.  This  occurs  when  for  each 
half  of  the  buckets  divided  by  the  median,  n/?  -  1  elements  go 
into  one  bucket  and  one  element  goes  into  some  other  bucket. 
Therefore  out  of  n  buckets  only  four  get  used.  The  time,  T(n), 
in  the  worst  case  is  given  by  the  recurrence  formula: 

T(n)  -  cn  +  ?T(n/?).  T(l)  =  Cq 
Solving  this  in  a  similar  fashion  as  with  Quicksort's  best  case: 

T(n)  ■  cn  log  n  0(n) 
which  is  0(n  log  n). 

It  has  been  shown  that  OPS  has  a  great  advantage  over 
classical  comparison-based  sorts  because  it  is  0(n)  in  the 
expected  case.  OPS  is  also  faster  because  there  is  no  swapping 
of  elements  as  in  exchange  sorts  like  Quicksort.  Rather, 
linked  lists  are  kept,  where  link  values  are  replaced  to 
reflect  the  changing  bucket  statuses.  The  data  values 
themselves  never  move,  but  instead  link  values  change.  In  this 
way,  OPS  is  much  like  Knuth's  Multiple  Insertion  which  has  many 
of  the  same  qualities  and  characteristics. 

1.6  Problems  with  OPS 

There  are  many  problems  with  DPS  in  both  a  theoretical  and 
practical  sense.  They  vary  from  fundamental  problems  with  the 
algorithm,  to  large  time  and  space  overheads,  to  theoretically 
bad  worst  case  running  times.  These  will  be  discussed  here 
with  suggested  solutions  that  form  the  basis  for  the 
implementations  developed  in  this  research. 


5-26 


There  are  several  flaws  in  the  algorithm  as  originally 
published.  First  of  all,  it  is  not  necessary  to  find  the 
maximum,  minimum,  and  median  in  the  preparatory  pass  (line  6). 

Then  there  is  a  logic  problem  with  the  divide  and  conquer 
approach  used  in  procedure  SORT.  The  logic  divides  the  array 
in  half.  The  first  call  of  LSORT  sorts  the  first  half  of  the 
array  and  the  whole  array  is  then  reo'^dered  by  procedure 
MACLAREN.  For  example: 

S0RT(  7  8  10  3  6  9  4  11  ) 

LS0RT{  7  8  10  3  ) 

MACLAREN  yields  {  3  7  8  10  6  9  4  11  ) 


Then  LSORT  is  called  for  the  second  half  of  the  array: 
(6  9  4  11  )  .  This  will  yield  two  sorted  vectors. 
(  3  7  8  10  4  6  9  11  ) 


These  now  need  to  be  merged  to  produce  the  correct  ordering. 
To  correct  this  problem,  LSORT  needs  to  be  called  only  once  to 
sort  the  entire  array,  and  the  code  for  'incr'  is  eliminated. 
Otherwise  the  two  sorted  halves  need  to  be  merged. 

Another  problem  is  the  partition  formula,  restated  here: 
if  X  <  median 


ilii  ^ 

In  the  else  clause,  the  range  of 


x-roedi an 
max-median 


is  (0,1] 
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which  win  yield  values  in  the  else  expression  of 


( 


n*! 

2  »  n 


] 


For  example,  with  16  buckets  this  formula  will  give  values  from 

9  to  16,  which  is  the  desired  result.  It  can  be  concluded  that 

the  else  clause  is  a  valid  expression. 

Now  consider  the  then  clause,  where  the  range  of 

x-min  .  rr,  , 
medi an-mi n  L  .  J 

This  is  inclusive  because  the  median  is  included  in  the  domain 
of  the  formula.  The  entire  formula  will  yield  values 


[1  .  (^/2j  ] 

For  example,  with  16  buckets,  this  will  give  values  from  1  to 
8.  However,  in  only  one  case  will  these  fornrulas  ever  yield  a 
value  of  8.  That  case  occurs  when  the  bucket  value  for  the 
median  is  being  calculated.  A  better  choice  for  the  |£n+l)/^ 
th  bucket  should  be  made. 

This  can  be  done  by  choosing  the  then  expression 

where  min?  ■  min-0. 0000001 .  So  now  the  range  for 
( x-min )/ (median-min? )  is  [0,1),  and  the  range  for-  the  then 
clause  is  [1,  ^  buckets,  this  will  yield 

values  from  1  to  7  which  is  closer  but  not  quite  there.  Now  8 
is  missing.  It  will  soon  be  seen  how  this  problem  is  solved. 

For  now,  consider  the  case  when  n»?  in  the  then  clause. 
For  example,  a  bucket  contains  values 


(61  87  ) 
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Here,  FINDMAXMINMEO  will  yield 

MAX  >  87  MIN  >  61  and  MED  .  61 
The  median  is  chosen  to  be  the  j^n+l)/2jth  (or  1st)  element. 
Using  these  values  in  the  then  expression  will  give  a  zero  or 
some  other  small  value  in  the  denominator.  This  can  easily  be 
corrected  by  choosing  the  median  to  be  the  j(n+l)/Fj  th 
element.  So  now: 


MAX  =  87  MIN  =  61  and  MED  >  87 

But  now  upon  evaluating  the  then  clause,  both  elements  yield 
bucket  values  of  1.  This  is  because  the  {n-?)/2  term  of  the 
expression  will  be  zero  since  n=?.  A  suggested  correction  for 
this  Is  to  evaluate  the  then  clause 


I  x-min  •!!■  +  1 

median-min?  *  ?  ^ 


which  yields  Integer  value?  in  the  range  [1,  [_(n/?)+^  ). 

So  for  16  buckets,  the  values  1  to  8  will  be  generated,  where 
now  8  is  included  as  desired.  For  odd  n,  the  extra  bucket 
value  will  be  generated  in  the  then  clause. 

There  is  yet  a  further  problem  with  the  then  clause. 
Consider  the  case  where  the  median  is  equal  to  or  nearly  equal 
to  the  minimum.  Previously,  the  concern  was  that  a  zero  In  the 
denominator  was  likely.  This  might  still  be  the  case  for  an 
input  vector  such  as: 

(  61  61  61  61  87  92  ) 

Here:  MAX  -  92  MED  .  61  and  MIN  .  61 
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Again  there  is  the  same  problem  as  before.  One  way  to  get 
around  this  is  to  add  the  code: 


i f  min  =«  median  then  median  :=  max 
This  way  everything  will  be  evaluated  in  the  the'n  clause.  Note 
that  the  case  where  the  median  is  equal  to  the  maximum  is  of  no 
concern.  This  is  because  the  else  clause  would  never  get 

evaluated  in  such  a  case.  But  the  median 
still  causes  some  headaches,  as  it  will  soon  be  seen. 

DPS  proved  to  be  very  slow  when  implemented  as  previously 
presented.  In  fact,  the  running  time  was  7S%  slower  than 
Quicksort  on  the  average.  It  was  obvious  that  to  become 

compatible  with  results  published  in  Information  Processing 
Letters .  the  algorithm  needed  to  be  optimized  as  much  as 
possible.  Later  publications  indicated  that  a  certain  amount 

of  code  optimization  was  being  done  on  the  original  DPS 
algor ithm. 

As  pointed  out  by  Lamagna,  Bass,  and  Anderson  [LAMA80], 
the  consideration  of  constant  factors  in  algorithms  is 
important,  as  is  the  case  here.  Sloppy  code  and  inefficient 
algorithms  can  be  the  source  of  large  bottlenecks.  DPS 
requires  the  selection  of  the  maximum,  minimum,  and  median 
elements.  The  published  version  of  DPS  used  a  15n  crude  median 
selection  algorithm  [KNUT73,  p.  ?16]  and  it  is  possible  to  use 
an  inefficient  ?n  method  to  find  the  maximum  and  minimum. 

The  implementation  of  DPS  in  this  thesis  uses 

FI oyd-Rivest ' s  1.5n  exact  median  selection  algorithm  [FLOY75], 
and  a  1.5n  maximum-minimum  selection  algorithm;  a  significant 


savings.  This  median  selection  method  chooses  the  exact 
median,  as  opposed  to  the  crude  estimate  method  used  in  the 
prior  implementation.  Although  there  is  still  a  high  overhead 
for  these  algorithms,  it  is  by  no  means  as  great  as  for  the 
methods  originally  suggested  by  Dobosiewicz.  The  Floyd-Rivest 
method  uses  n  extra  storage  as  opposed  to  n/?  extra  for  the 
suggested  method,  but  the  time  savings  is  well  worth  the  extra 
space.  The  reader  is  referred  to  [BLUM73,  FUSS79,  SCH076]  for 
more  details  of  median  selection. 

There  are  two  bottlenecks  that  arise  in  DPS  as  they  did  in 
Quicksort.  One  of  these  is  the  p'^oblem  brought  about  by 
recursion.  As  Lamagna,  Bass,  and  Anderson  [LAMA80]  point  out, 
much  time  and  space  is  used  in  most  implementations  of 
recursion.  In  most  cases  the  overhead  can  be  eliminated  by 
creating  a  stack  to  store  crucial  values  and  efficientl'y  coding 
some  type  of  outer  loop  to  simulate  the  recursion.  This  is 
true  with  OPS  also.  It  has  been  found  by  this  author  that  the 
recursion  can  be  removed  without  creating  extra  stacks  or  using 
extra  space.  This  is  done  by  taking  advantage  of  a  suggestion 
Dobosiewicz  makes  in  Step  3.  He  states  that  “the  array  L 
should  be  compacted",  and  this  can  be  done  quickly  in  0(n). 
The  extra  available  space  created  by  this  compression  allows 
room  for  any  needed  bucket  heads  in  subsequent  levels  of 
recursion.  Everything  else  being  performed  with  pointers  is 
done  in  place,  so  no  extra  space  is  required. 

Additionally,  there  is  the  matter  of  what  to  do  with  small 
buckets,  which  is  similar  to  the  problem  of  Quicksort's  small 
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subfiles.  At  some  point  It  becomes  advantageous  to  use  an 

efficient:  sorting  routine  on  these  small  buckets  rather  than 

recursively  using  OPS  until  the  bucket  size  Is  1.  It  was  found 

that  for  DPS,  It  Is  more  efficient  to  use  an  Insertlonsort  on 

bucket  sizes  less  than  or  equal  to  9  or  10. 

/ 

Further  Improvements  can  be  trade  In  the  algorithm's  run 
time  by  optimizing  the  code.  It  is  possible  to  combine  certain 
loops  in  what  can  be  considered  to  be  the  heart  of  the 
algorithm  in  Steps  2  and  3.  Step  ?  performs  many  common 
arithmetic  operations  repeatedly.  These  may  be  removed  from 
the  loop  to  save  time.  It  is  also  possible  to  optimize  Step  3 
and  the  code  resulting  from  removing  recursion  such  that  a  very 
tight  efficient  loop  can  be  Implemented. 

Unfortunately,  there  are  certain  characteristics  of  the 
algorithm  that  cannot  be  dealt  with.  It  turns  out  that  due  to 
the  nature  of  the  Last-In-FIrst-Out  (LIFO)  lists  used  as 
pointers  for  the  buckets,  OPS  is  unstable.  Records  with  equal 
keys  may  not  necessarily  be  output  in  the  same  relative  order 
in  which  they  were  input.  In  fact,  if  an  odd  number  of  passes 
is  made  over  the  equal  keys,  they  will  be  sorted  in  reverse 
relative  order.  If  an  even  number  of  passes  is  made,  the  items 
will  be  stably  sorted.  Quicksort  also  has  a  stability  problem, 
although  the  reasons  fo'  this  are  entirely  different.  It  is  an 
interesting  phenomenon. 

As  Baase  pointed  out,  there  is  a  large  storage  overhead 
associated  with  this  type  of  algorithm  [BAAS78].  For  the 
pointers  alone,  the  overhead  is  ?n,  and  for  the  median 
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selection  it  is  an  additional  n.  However,  with  today's  virtual 
memory  environments,  the  impact  of  this  consideration  is 
minimized.  Only  for  extremely  large  n  would  problems  in 
storage  occur. 

The  reader  is  referred  to  [AKER78,  BURT78,  0ATA78, 
00B078b,  D0B079,  HUTT79,  JACK79]  for  additional  arguments 

concerning  the  practical  significance  of  Distributive 
Partitioning  Sorting  which  are  of  little  concern  to  this  work. 
The  DPS  algorithm  used  in  this  work  is  presented  in  Figure  1.6. 
Summary  of  suggested  improvements  to  DPS: 

.  Delete  preparatory  pass 
.  Remove  divide-and-conquer  approach 
.  Adjust  the  partitioning  expression 
.  Handle  the  minimum  «  median  case 
.  Remove  recursion 
.  Optimize  loops 

.  Multiply  by  0.5  instead  of  dividing  by  ?.0 
.  Eliminate  mixed  mode  arithmetic 
.  Take  common  expressions  out  of  loops 
.  Choose  the  median  ■  (min-*-max}/? 

.  Use  Insertionsort  on  small  buckets 


I 

A 

> 

I? 

i 


Figure  1.6 
Algorithm  DPS 


procedure  0PS(n,A) 
integer  array  L(n) ,S(n) 
integer  n ,  1 ength , i , j ,p, link 
array  A 

F or  i:*  1  step  1  until  n  do  L{i):=0 
LTn)  :=1 

For  i:=  1  step  1  until  n-1  do  S(i):=i+1 

Trn):*0  — 

top:=  length:=n 

whi 1e(top  <  n)  //  Recurse  // 

PlNDMAXMINTL{top) ) 

II  Find  max  and  min  of  list  pointed  at  by  L(top)  // 
//  Note:  Adaptive  methods  are  placed  here  // 

I  ink :=L(top) 

L{top):=0 

IF  min  =  max  then  APPEND(link) 
eTse 

^or  i:=  link,p  while  i>0  do 
begin 

p:»S( i ) 

j:=  partitioning  formula  that  distributes  A(i) 
S(i):=L(j) 

L  (  j  ) :  =  i 

end 

COMPRESS! L , top, n, length) 

II  List  heads  are  pushed  to  the  back  of  array  L  such 
that  the  front  of  the  array  is  all  zeroed,  top:* 
the  first  non-zero  pointer  II 

length :=0 

F or  i : =L { t op ) , S( i )  while  i >0  £o  1  ength :  =  1 ength  +  1 
do  while  (length  <  i nsert i onsort  cutofF) 

APPEND(L(top) ) 

L(top):=0 
top: =top+l 

i F  top  £  n  then  //  Find  length  of  next  bucket  // 
begin 

1 ength : =0 

for  i  :=L(top) ,S( i )  while  i>0  do 
length;=length+l 

end 

else  exit  OPS 

end 

end  OPS 
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Having  addressed  most  of  the  Issues  raised  by  Baase 


earl i er , 

the 

adaptive 

methods 

of  OPS 

can 

now 

be 

discussed 

.  As 

it  has  been  shown. 

the  worst 

case 

for 

OPS 

is  0(n  1 

og  n) 

,  which  is 

no  better  than  Quii 

cksort 

on 

the 

average . 

The 

quest i on 

is:  Can 

anyth ing 

be  ga 

i  ned 

by 

knowing 

someth 

ing  about 

the  distribution  of 

the 

data 

i  n 

advancev 

And , 

if  so,  is 

it  worth 

it? 

CHAPTER  II 


ADAPTIVE  METHODS  FOR  UNKNOWN  DISTRIBUTIONS 

Recently,  work  was  done  by  Meijer  and  Akl  [MEIJ80]  to  try 

to  "Hybrid"  Distributive  Partitioning  Sorting  according  to  a 

known  distribution.  Though  this  work  is  promising,  it  is  by  no 

means  general  enough  to  handle  empirical  or  general 

distributions.  The  authors  suggest, 

"...when  the  distribution  of  the  input  sequence  is 
not  known,  another  topic  for  future  research  would  be 
to  study  the  problems  associated  with  estimating  this 
distribution." 

When  that  paper  was  published,  the  proposal  for  this  thesis  was 
independently  being  formulated  and  exactly  those  ideas  were 
suggested  as  a  course  of  thesis  work.  (In  fact,  this  author 
did  not  receive  the  above  publication  until  seven  months  after 
it  was  issued,  and  the  thesis  work  was  well  into  the 

experimental  stage.) 

The  purpose  of  exploring  adaptive  methods  for  DPS  is  to 

improve  its  worst  case  performance.  It  is  readily  seen  that  as 
the  data  distribution  becomes  more  and  more  skewed,  the  worst 

case  is  approached.  Dobosiewicz  shows  that  the  worst  case  is  a 
set  of  factorials.  It  is  desired,  then,  that  these  methods  be 
'adaptive'  in  the  sense  that  they  adjust  to  whatever  data 
distribution  is  given. 

Recall  that  DPS  divides  the  range  of  the  data  into  n 

partitions  based  on  the  maximum,  minimum,  and  median  (or  mean) 
values.  Half  of  these  partitions  are  all  of  one  fixed  length. 
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[ 

and  the  other  half  of  another  length.  For  skewed 
distributions,  the  resulting  bucket  sizes  could  vary  greatly. 
It  is  the  goal  of  the  adaptive  methods  to  examine  the  data 
distribution  and  somehow  transform  these  potentially  large 
'i)ucket  sizes  into  buckets  with  as  close  to  one  item  per  bucket 
as  possible  (OPS's  best  case).  Figure  II. 1  graphically  shows 
these  ideas  and  concerns. 

The  search  for  various  adaptive  methods  has  crossed  many 
different  fields  of  mathematics,  including:  linear  algebra, 

numerical  analysis,  statistics,  probability,  combinatorics,  and 
plain  old  "horse  sense”  math.  These  approaches  will  be 
discussed  in  this  section  along  with  their  advantages  and 
disadvantages . 

A  number  of  questions  arise  as  these  adaptive  methods  are 
being  looked  at: 

.  What  information  about  the  distribution  will  be 
useful ? 

.  How  can  the  information  be  used  to  obtain  the  goal? 

.  Is  the  information  and  goal  obtained  easily  and 
quickly  at  relatively  little  cost? 


In  discussing  DPS,  the  term  bucket  size  refers  to  the 
number  of  items  per  bucket.  Partition  length  refers  to 
the  length  of  a  partition  within  the  range  of  the  data. 
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II. I  Frequency  Distribution  Curves 
If  a  small  sample  of  the  data  Is  taken  and  distributed 
into  buckets,  then  the  bucket  sizes  can  be  thought  of  as  being 
a  set  of  frequency  occurrences.  Often  this  set  of  frequencies 
fits  a  theoretical  distribution  such  as  Unifor.i,  Normal, 
Poisson,  or  Exponential,  as  shown  in  Figure  II. 2.  Many  times  a 

curve  can  be  fit  to  these  frequencies.  Usually,  the 
probabilities  of  the  frequencies  are  found  and  a  curve  is  fit 
to  them.  This  is  known  as  a  Probability  Density  Curve.  By 
finding  such  a  curve,  it  might  be  possible  to  find  a 

transformation  to  appropriately  adjust  the  partition  lengths  so 
the  bucket  sizes  are  more  uniform.  Some  methods  of  finding 
Probability  Density  Curves  will  now  be  discussed. 

11.1.1  Method  of  Moments 

Varioi»i  statistics  concerning  distributions  can  be 
gathered  such  as  the  mean,  skewness,  kurtosis,  and  others. 

These  are  called  moments.  The  moments  about  the  origin  for  the 
elements  x^,  x^,  ...,  can  be  calculated  by 

m  1  r  X 

r  n  i^l  1 

According  to  Elderton  and  Johnson  [ELDE69],  if  n  is  the 

number  of  points  and  m^  is  the  r-th  moment,  then  a  frequency 
distribution  curve  can  be  fit  to: 
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1)  y  =  d  +  bx,  where 


n  "*0 


3  1 

“•  “57  • 


m. 


2}  y  «  a  +  bx  +  cx?,  where 


3  f  3  „  5 

~  ^  “?7r  "o  ■ 


?n 


ni)i% 

•  i  > 


3  1  '"1 

n  •  ?n  •  n 


c  .  15,  (  .  4:,  .  .  -3 


^  -  “57  •  "'0  "  “57  •  ;;5“  ^ 


3)  y  =  a  +  bx  +  cx^  +  dx3 


3/3  5  "*?  X 

^ "  “3“  ^  ^  '"o  ■  ^  ^ 


15/5  "l  7  "’3 

^  '  n - 7^  ' 

n 


37  ^  57 


15 


m  ♦  3  X 

0  ^  ’ 


<..«(- 

4n'* 


3  ! 

57  '  n 


1.5  '"3  . 

-*?n  -iiT) 


A  system  of  equations  fitting  various  commonly  occurring 
distributions  were  developed  by  Karl  Pearson.  These  equations 
are  for  curves  of  the  form 

pQ  +  bjX  ♦  b^x^  -  0 
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Type  I)  If  the  roots  are  real  and  of  different  sign 

ni#« 


.  (  1  -  ^  ) 


where  a^  ■  rootj^  -  (distance  from  origin  to  mode) 
a^  =  rootp  -  (distance  from  origin  to  mode) 
and  m^^/aj^  ■  m^/a^ 

Type  VI)  If  the  roots  are  of  the  same  sign. 

“flirt 

v  »  Vq  (  X  -  a  )  1.x 


And  so  on  for  other  types. 

There  is  also  a  set  of  normal  curves  known  as  Gram-Chalier 
curves  which  use  moments  in  fitting  a  curve  to  a  distribution 
[GRAM45].  Given  a  normal  frequency  function  f(x),  and  g(x)  is 
the  standard  normal  function  where  mean=0,  and  variance*!,  such 
that 


g(x) 


x^/J 


✓7^ 


then 


c  c 

f(x)  *  g(x)  +  JT-  g^^^(x)  +  g^'^^(x)  +  ... 

where  c^  *  -m^  -  -  skewness  coefficient 
‘"4  *  "’4  "  ^  “  excess  coefficient 

Cc  =  -m,  10m- 

5  5  3 

c-  ■  m-  -  15m.  +  30 

6  6  4 

The  problem  with  these  moment  methods  is  that  an  excessive 
amount  of  time  is  used  in  determining  the  moments  and 
coefficients  of  the  equations.  The  number  of  arithmetic 
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operations  being  performed  would  rapidly  become  very  large. 
Although  moment  methods'  would  provi'de  a  good  guess  to  the 
distribution,  they  lack  the  efficiency  that  is’  desired  for  a 
modification  to  DPS. 

1 1 .1 .?  Curve  Fitt  ing 

Another  way  to  'discover*  the  distribution  is  to  try  to 
fit  a  curve  [DANI80]  to  the  probability  density  function  based 
on  the  sampled  probabilities.  The  first  thought  that  might 
come  to  mind  is  to  try  a  high  degree  least  squares  fit 
[STRA76],  such  that  a  third  or  fourth  degree  polynomial  fit. 
Although  a  large  number  of  calculations  are  needed,  it  would 
not  be  as  great  as  with  the  mom'  ■  calculations,  especially  if 
the  number  of  sampling  cells  is  kept  relatively  small. 


The  least 

squares 

fit 

would  work 

nicely  if 

the 

d i str ibut i on 

were  smooth. 

In  practice. 

though , 

many 

distributions 

do 

not  fit 

smooth,  monotonic. 

or  well  behaved 

curves  ( i .  e . , 

dictionary 

data. 

last  names. 

social  security 

numbers,  etc. 

). 

Leasts  squares 

methods  might  yield  a 

badly 

fitting  curve 

[GERA78]. 

Suppose 

a 

straight 

1  ine 

fit  is  used  between 

cel  1 

probabi 1  it ies 

as 

shown  in 

Figure  II. 3.  For 

each  line 

,  the 

slope  and  y-intercept  can  be  saved  and  used  later  to  determine 
what  bucket  to  adjust  an  item  to.  But  can  this  practically  be 
done?  Given  an  item  and  this  sample  probability  density  curve, 
the  item's  relative  position  in  the  range  needs  to  be  found. 
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Figure  II. 3 

Line  Pit  to  Frequency  Probabilities 
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The  only  sensible  way  to  accomplish  this  is  to  determine  the 
probability  that  an  item  will  fall  into  the  i-th  bucket  with 
respect  to  the  rest  of  the  data.  This  suggests  we  need  to  find 
the  Cumulative  Distribution  Function  (CDF)  as  opposed  to  the 

Probability  Density  Curve  (POC). 

The  CDF  can  be  found  by  integrating  the  POC.  The 
preprocessing  necessary  to  find  the  PDC  by  these  prior 
techniques  would  be  quite  time  consuming.  A  method  will  now  be 
suggested  to  find  the  CDF  quickly  and  efficiently. 

II.?  Cumulative  Distribution  Function  (CDF)  Method 
A  more  useful  tool  for  adaptive  methods  is  the  Cumulative 
Distribution  Function.  This  was  used  independently  in  work 

done  by  Meijer  and  Akl  [HEIJ80]  for  known  distributions.  If 

f(t)  is  the  Probability  Density  Curve,  then  the  Cumulative 
Distribution  Function  is 

If  the  probability,  p^,  for  an  item  falling  into  the 
i-th  sampling  cell  is  given,  then  the  cumulative 
probability  distribution  for  the  i-th  cell  is 

■  kio  °  i  i  ‘ 

A  useful  property  of  this  curve  Is  that  it  is  continuous 

monotonic  nondecreasing.  Fitting  a  line  between  each 

successive  pair  of  sampling  cells  should  give  a  good 
estimate  of  the  Cumulative  Distribution  Function. 
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It  will  now  be  shown  that  if  the  cumulative 
distribution  function  is  known  or  can  be  approximated, 
then  the  resulting  transformation  of  items  by  this 
function  is  uniform. 

Given; 

X  is  the  underlying  random  variable  of  the  data. 

Gj^(x)  -  P(X  _<  x)  (1) 

is  the  Cumulative  Distribution  Function.  It  is  continuous 
monotonic  increasing  so  the  inverse,  6jj~^(x),  also 

exists . 

Y  is  a  random  variable  where  Y  »  Gjj(X)  is  the  transform 
data.  To  find  the  distribution  of  y,  we  observe  that  the 

distribution  of  Y  is  uniform  on  [0,1]  because 
FY(y)  ■  P(Y£y)  by  def  of  CDF 

-  P(6jj(X)  <  y)  substitution 

»  P(X  <  Gj^‘*^(y))  inverse  of  both  sides 

=  Gjj(Gj^~^(y))  by  def  (1) 

-  y 

Therefore  the  transformation  is  uniform. 

Figure  II. 4  shows  how  the  CDF  takes  any  distribution 
along  the  x-axis  and  transforms  it  onto  a  uniform 
distribution  on  the  y-axis.  This  implies  that  if  a  sample 
CDF  can  be  found,  then  the  resulting  items  can  be  spread 
uniformly  among  the  buckets.  This  is  essentially  what  is 
done  "in  reverse"  when  a  uniform  random  distribution  is 
transformed  into  another  distribution  in  simulation 
systems  [GRAY80]. 
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Figure  II. 

Cumulative  Distribution  Function 
Uniform  Transformation 
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Figure  II. 5  shows  a  probability  density  curve  and  its 
corresponding  cumulative  distribution  function. 

A  very  simple,  practical  algorithm  can  be  written  using 
the  idea  of  cumulative  distribution  functions  to  create  an 
adaptive  OPS  method. 

Step  1)  Sample  the  data  and  distribute  it  into  cells  by 


where  mx?  >  max  .0000001  and  M  is  some  arbitrary 
number  of  sampling  cells.  This  formula  yields  integers 
in  the  range  [1,M]. 

Step  2)  Find  the  cumulative  probabilities  of  the  M  cells. 

Step  3)  Fit  a  line  between  each  pair  of  cumulative 
probabilities  using  Pq«0.0  and  P|^=0. 9999999 .  Save 
the  slope  and  y-intercept  of  each  line.  This  yields  a 
sample  Cumulative  Distribution  Function. 

Step  4)  Distribute  all  of  the  items  by  first  determining 
which  sample  cell  it  belongs  to,  say  k,  and  then  use 
the  k-th  line  equation  to  find  what  bucket  the  item 
really  falls  into.  Note  that  to  insure  the  CDF  is 
monotonic  increasing,  as  opposed  to  nondecreasing,  each 
sampling  cell  is  initialized  to  have  one  item.  This  is 
to  guarantee  the  inverse  CDF  function  exists. 

Figure  II. 6  shows  these  steps  pictorially,  and  Figure  II. 7 
describes  the  algorithm  in  detail. 
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PlRure  IT -5 

Sampled  Probability  Density  Curve 
and  Its  Cumulative  Distribution  Function 


1 


% 


Data  Distribution 


Sample  Items  into  Cells 


Figure  II. 7 
Algorithm  COF 

The  following  algorithm  is  placed  after  FINDMAXMIN  in  DPS: 

integer  sample 

if  length  >  sample  then 

■”  begin  II  CDF  // 

integer  k 

array  CELL(0:m) ,M(m) ,B(m) 

//CELL  —  sampling  cell? 

m  —  number  of  cells 

M  —  slope  of  lines 

B  —  y-intercept  of  lines  II 

FREQUENCY(L(tOp)) 

//Take  a  frequency  count  of  items  assuming  a 
uniform  distribution  into  m  cells  ranging 
from  min  to  max.  Each  frequency  is  initially  1.// 

CUMPROB(CELL,m) 

//Finds  the  cumulative  probabilities  of  frequency 
cells  II 


CELL(0):-0.0 
CELL(m):-0. 9999999 
LINEFIT(CELL,m) 

//Fit  m  lines  to  m-*-!  points.  Put  slopes  into  N 
and  y-intercepts  into  B  II 


for  i:«L(top),p  while  i>0  do 
begin 

P:-S(i) 


•  -J  *  1  II  II 

J  =  •  .Sx-It.?  •  "  "  '“O  •  '‘"H*  * 

II  Transforms  data  into  uniform  distri¬ 
bution  II 


end 

end 

eTse  DPS(L(top))  II  as  usual  II 
COHPRESS(L. top. n, length) 
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As  an  example,  suppose  there  is  a  skewed  distribution 
like  that  .in  Figure  II. 6.  If  there  are  100  items  being 
distributed  into  100  buckets,  then  note  what  happens  to  the 
items  that  would  normally  have  gone  in'  cell  #1.  The 
cumulative  probability  for  this  cell  ranges  from  0.0  to 
OC'7.  Therefore  70^  of  the  data  falls  into  cell  #1. 

Plugging  the  values  of  the  items  belonging  to  cell  number  1 
into  the  first  line  equation  and  multiplying  by  100  will  now 
yield  bucket  values  from  1  to  70,  instead  of  1  to  ?0  as 

would  normally  have  occurred  in  DPS.  This  is  the  result 

that  is  desired  from  adaptive  methods. 

11.3  Ranking  Method 

Now  the  question  arises:  Is  it  really  necessary  to 
find  out  anything  at  all  about  the  distribution?  In  fact, 
there  is  a  simple  method  by  which  to  adapt  the  partition 

lengths  to  a  particular  data  distribution  without  gathering 
information  about  the  distribution  itself.  This  can  be 
achieved  by  the  following  algorithm: 

Step  1)  Sort  a  sample  of  the  data  by  some  fast  method. 

Step  2)  Divide  the  number  of  items  sampled  by  the  number 
of  cells  (m/M),  and  then  choose  Partition  Endpoints  by 
selecting  every  (m/H)th  item.  Note  that  M  should  be  of 

It 

the  form  2  -1  to  facilitate  the  binary  search  in  step  3. 
EQ-min-0. 0000001 ,  and  E|^>max. 

Step  3)  For  each  item  in  the  file,  perform  a  binary 
search  to  find  what  cell  it  belongs  in. 
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Step  4)  Find  the  bucket  it  falls  in  by  the  expression: 


where  k  is  the  cell  found 

—  k-th  right  side  endpoint 
E|^  j  —  k-th  left  side  endpoint 
n  —  number  of  data  points 
M  —  number  of  cells 

If  the  number  of  cells  and  the  sample  size  are  kept  small  and 
fixed,  then  the  overhead  associated  with  the  binary  searching 
and  sorting  will  be  a  fixed  constant  factor.  Figure  II. 8 
outl ines  the  method. 

It  is  hoped  that  an  experimental  analysis  of  these  methods 
will  show  that  OPS  can  be  improved  to  handle  unknown 
d i sir ibut i ons . 


Figure  II. 8 
Algorithm  RANKING 


FINDMAXMIN{L{top) ) 
integer  sample 
if  length  >  sample  then 
begin 

integer  k 
array  E(m) 

QUICKS0RT(samp1e,RA) 

II  Order  the  sample  array  RA  // 

For  i:=  1  step  1  until  m  ^  E(i):=  RA(i  .  sample/m) 
II  Get  every  (sample/m)th  item  // 


E(0):*min? 

E(m) :=max 
link: *L(top ) 

L ( top) : =0 

for  i:«link,p  while  i>0  do 
begin 

p:“S( i ) 

k:»BINSEARCH(x,E) 


II  Find  cell  x  belongs  to  by  binary  search  // 


j:  = 


( 


x-E(k-l 


£lk)-e(k-l ) 


+  k-1  ) 


length 

m 


S(  i )  :*L( j  ) 

L( j):*i 

end 

end 

else  0PS(L(top))  II  as  usual  II 
C0MPRESS(L,top,n .length ) 
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CHAPTER  III 


EXPERIMENTAL  DESIGN  AND  ISSUES 

This  chapter  is  intended  to  describe  the  issues  and 

problems  in  designing  appropriate  experiments  for  DPS  and  the 

adaptive  methods.  It  could  also  serve  as  a  guide  for  other 

researchers  conducting  work  in  a  virtual  machine  environment 
where  algorithm  timings  are  needed.  The  last  part  of  this 
chapter  describes  the  reasons  for  choosing  various  parameters 

used  in  the  experiments. 


Ill.l  Experimental  Problems  and  Issues 
There  are  a  number  of  considerations  in  designing 
experiments  to  test  algorithms.  Lamagna,  Bass,  and  Anderson 
[LAMA80]  discuss  many  of  these  in  developing  a  research  plan  to 
study  the  performance  of  algorithms.  They  note  that  the 

programming  language  chosen  has  a  large  effect  on  how  well  a 
program  will  perform.  Various  compilers  will  generate  widely 


different  machine  code  as  would  be  the  case  for  COBOL,  FORTRAN, 
and  PL/I  compilers.  The  University  of  Rhode  Island  Academic 
Computer  Center  houses  a  National  Advanced  System/5  Model  7031 
which  is  an  IBM  3031  equivalent.  Due  to  the  advanced  features 
of  the  PL/I  Optimizing  Compiler,  PL/I  was  chosen  to  code  the 
aforementioned  algorithms.  As  will  be  seen,  PL/I  also  contains 
some  useful  compiler  options. 

The  machine  chosen  to  execute  on  plays  a  large  role  in  how 
fast  a  given  program  will  run.  For  example,  the  floating  point 


operations  on  a  COC  machine  are  many  times  faster  than  on  a 
comparable  IBM  due  to  the  hardware  configuration  of  the 
machines.  Thus  it  should  be  noted  that  while  one  algorithm  may 
outperform  another  by  a  large  factor  on  one  machine,  this  may 
not  necessarily  be  true  on  another.  This  has  been  seen  in 
previous  experiments  conducted  with  DPS  [D0B079]. 

There  now  comes  a  problem  common  to  many  fields  of 
endeavor.  And  that  is,  the  extent  to  which  one  considers  the 
work  of  the  theorist  when  putting  a  concept  into  practice.  In 
computer  science,  this  problem  is  exemplified  by  the  conflict 
between  theoretical  order  of  magnitudes,  and  practical 
considerations  for  loop  control,  testing,  bookkeeping,  and 
memory  accesses.  These  latter  factors  can  contribute  a  high 
constant  of  proportionality  to  the  theoretic  order  of 
magnitude.  A  {?n  log  n  +  3n)  algorithm  will,  for  example, 
generally  perform  better  than  a  (3n  log  n  +  5n)  algorithm,  even 
though  they  are  both  theoretically  0(n  log  n). 

This  consideration  lends  itself  to  the  issue  of  crossover 
points,  that  is,  the  point  where  one  algorithm  begins 

3 

outperforming  another.  For  example,  a  ?n  algorithm  is 

2 

better  than  a  50n  algorithm  for  n<?5,  although  for  n>?5  the 
reverse  it  true.  It  will  be  seen  where  the  crossover  is  for 
DPS  and  Quicksort. 

Lamagna,  Bass,  and  Anderson  [LAMA80]  also  point  out  that 
in  addition  to  these  issues,  an  algorithm  can  be  greatly 
improved  by  various  modifications,  although  the  order  of 
magnitude  stays  the  same.  By  utilizing  clever  data  structures. 
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loop  control,  and  Insights,  an  algorithm  can  greatly  Improve 
Its  performance  as  was  the  case  with  Quicksort.  Another 
example  of  this  Is  the  remarkable  Insight  of  Doboslewicz  In 
transforming  0(n  )  Bubblesort  Into  an  algorithm  which 
outperforms  Quicksort  on  Input  sizes  less  than  ?000  [00B080]! 
But  while  one  constant  factor  may  decrease  due  to  a  change, 
another  could  Increase.  So  the  question  Is:  At  what  point  Is 
cleverness  and  extra  overhead  not  worth  It?  There  comes  a 
point  where  simplicity  may  outweigh  efficiency. 

III.?  Experimental  Design 

As  mentioned,  work  for  this  thesis  was  done  on  a  National 
Semiconductor  plug  compatible  IBM  computer  In  a  virtual  memory 
environment.  Due  to  paging,  cycle  stealing,  swapping,  and  load 
on  the  computer,  the  run  time  of  two  identical  experiments 
could  vary  by  up  to  25%.  Experiments  were  designed  to 
eliminate  this  undesirable  "noise"  from  the  run  times. 
Lamagna,  Bass,  and  Anderson  suggested  determining  weights  for 
straight  line  code  In  an  algorithm  and  then  counting  how  many 
times  each  section  was  executed. 

These  estimates  could  prove  to  be  Inaccurate  In  practice 
if  they  are  not  chosen  carefully.  A  method  of  obtaining  fairly 
accurate  run  times  was  used  In  this  work,  and  will  now  be 
described.  The  PL/I  compiler  used  contains  a  COUNT  option 
which  produces  a  printout  of  how  many  times  each  statement  is 
executed.  This  can  easily  be  simulated  In  languages  not 
containing  this  feature.  There  Is  also  a  LIST  option  which 
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generates  listings  of  assembly  code  similar  to  the  machine  code 
produced  by  the  compiler.  Using  instruction  timings  available 
in  the  IBM  System/370  Model  158  Functional  Cha^acter i st ics 
[IBM78],  it  is  possible  to  calculate  the  timing  for  each 
instruction.  Although  this  is  tedious,  it  does  yield  accurate 
run  times.  And  with  some  amount  of  work,  this  process  can  be 
completely  automated.  By  multiplying  the  count  of  each 
instruction  by  its  timing,  and  then  summing  over  the  entire 
instruction  set,  a  good  estimate  of  the  algorithm's  run  time 
can  be  achieved.  In  addition,  the  problems  associated  with  job 
loads,  and  virtual  paging  environments  are  non-existent. 

There  now  remain  a  number  of  variables  to  be  identified 
for  the  experimental  design  [MYER79]. 

Irrelevant  Variables:  System  load,  virtual  memory  paging, 
swapping,  cycle  stealing 
Independent  Variables: 

Quantitative:  Input  size 

Qualitative:  Distribution  type,  algorithm  used 

Fixed:  Sample  size,  cell  number.  Insert ionsort  cutoff 

Random:  Values  in  the  random  data  file 

Dependent  variables:  Time 
Benchmark  Model  (Controlled  Experiment): 

Quicksort  vs.  DPS 

Practical  Requirements:  No  array  size  may  exceed  3?767  in 
PL/I,  Time  is  money. 
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Experimental  Design: 


Benchmark  Distributive  Partitioning  Sorting  algorithm 
against  Quicksort,  and  compare  the  results  to  those 
published  in  [D0B079].  The  version  of  Quicksort  used  is 
the  Sedgewick  implementation,  and  DPS  uses  an 
Insertionsort  cutoff  at  9  and  a  median  chosen  to  be  the 
mean  of  the  max  and  min,  or  midrange,  (max-'-min ) /? . 


Determine  run  times 

using 

A 1 qor i thms 

Distributions 

Input  Sizes 

DPS  (mdrg) 

Un 1  form 

500 

median  = 

midrange 

1000 

DPS  (median) 

Normal 

5000 

exact  median 

select i on 

10000 

Ranking 

Poisson 

?0000 

CDF 

Exponential 

30000 

Analyze 

.  The  effect  of 

a  distribution  with 

an  algorithm. 

.  The  effect  of 

the  input  size  on  an 

al gor ithm. 

.  An  algorithm's 

performance  against 

another 

algorithm  within  a  distribution. 

Each  experiment  consists  of  five  runs.  The  run  times  and 
various  percentages  are  taken  from  the  average  of  these  five 
runs.  Each  of  the  five  runs  contains  different  random  values 
as  data. 

It  should  be  noted  that  the  Poisson  distribution  is 
continuous  as  opposed  to  discrete  Poisson. 
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Due  to  the  nature  of  the  experimental  design,  there  are 
certain  constraints  on  what  can  be  said  about  the  conclusions 
to  be  reached.  Essentially  the  experiments  are  simulating  the 
run  time  as  if  the  program  were  being  given  stand  alone  time  on 
a.n  IBM  370/158.  In  reality,  operating  system  dependent  factors 
are  difficult  to  measure,  and  would  contribute  to  the  actual 
run  time.  But  these  have  been  eliminated  in  the  hope  of 
producing  good  relative  execution  time  results. 

Since  single  precision  real  numbers  were  used  in  previous 
OPS  experiments,  they  were  used  here  also.  Due  to  the  large 
amount  of  arithmetic  operations  in  DPS,  changing  the  input 
stream  to  integer  or  double  precision  could  radically  change 
the  timings.  Alphanumeric  keys  would  have  to  be  adapted  in 
some  way  so  DPS  could  work  with  them.  These  are  problems  which 
do  not  occur  in  comparison-based  sorts. 

III. 3  Discussion  of  Fixed  Variables 

There  are  three  fixed  variables  that  need  values  assigned 
to  them.  One  of  these  is  the  Insert  i  onsort  cutoff  point  for 
OPS.  The  optimum  cutoff  for  OPS(mdrg)  in  the  uniform  case  was 
determined  by  starting  with  a  value  of  6  and  incrementing  by  1 
until  it  was  found.  A  cutoff  of  9  or  10  was  found  to  work 
best.  Since  9  was  known  to  be  the  cutoff  for  Quicksort,  9  was 
also  chosen  as  the  cutoff  for  DPS.  It  should  be  apparent  that 
a  different  cutoff  might  be  possible  for  each  DPS  algorithm, 
input  size,  and  distribution.  To  avoid  a  lot  of  extra  work  to 
determine  cutoffs  for  the  two  DPS  methods,  and  out  of  fairness 
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to  OPS(mdrg)  in  the  uniform  case,  9  was  used  as  the 
Insert lonsort  cutoff  in  all  experiments.  This  value  should 
also  be  optimum  for  the  adaptive  methods  if  they  do  indeed 
transform  the  distributions  to  a  uniform  spread. 

Another  fixed  variable  is  the  number  of  sampling  cells 
used  in  the  adaptive  methods.  The  value  chosen  for  this 
variable  is  closely  related  to  the  sample  size.  The  number  of 
cells  and  the  sample  size  should  be  the  same  for  Ranking  and 
COF  out  of  fairness  to  each.  Ranking  has  a  binary  search  that 
requires  that  the  number  of  cells  be  one  less  than  a  power  of 
two.  Good  values  to  choose  might  be  7,  15,  31,  63,  and  1?7. 
The  higher  the  number,  the  more  work  the  0{log  n)  binary  search 
will  have  to  do. 

The  idea  of  the  COF  method  is  to  sample  the  Cumulative 
Distribution  Function.  It  would  be  desirable  if  the  cells 
could  sample  statistically  good  proportions  of  the  range  of  the 
data.  If  each  cell  samples  a  1;;  oi*  2%  proportion,  then  an 
appropriate  number  of  cells  can-  be-  chosen  which  divides  the 
range  into  1)1  or  3)1  intervals.  Figure  III.l  lists  the  possible 
cell  proportions. 

Figure  III.l 
Cell  Proportions 


k 

»  Cells  (P^l) 

Proportions 
100/ «  Cells 

3 

7 

14. ?9 

4 

15 

6.66 

5 

31 

3.?3 

6 

63 

1.59 

7 

1?7 

.79 
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Values  of  7  and  15  cells  would  not  divide  the  range  into 
small  enough  proportions  to  be  of  much  accuracy.  127  cells 
would  have  too  large  a  k  value  for  the  binary  search.  63  cells 
is  not  close  to  either  1^  or  2%  and  it  would  be  ambiguous  to 
choose  one  or  the  other. 

If  31  cells  are  chosen,  the  range  is  roughly  divided  into 
proportions.  This  is  small  enough  to  have  a  good  amount  of 
accuracy,  and  efficient  enough  for  use  in  a  binary  search. 

The  sample  size  now  needs  to  be  determined.  Using 

proportional  sample  statistics  and  standard  normal  distribution 

tables,  a  good  sample  size  can  be  found  if  1%  proportions  of 

the  data  are  desired. 

Given  3>6  proportions  with  .013  error  and  90*  confidence, 
then  a  good  sample  size  is  469.  Since  31*15  is  465,  a  size  of 
465  was  chosen.  This  also  allows  for  easy  adaptability  to  15 

cells  if  there  is  a  large  overhead  with  31  cells. 
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CHAPTER  IV 


RESULTS  AND  CONCLUSIONS 

IV. 1  Expectations.  Results,  and  Conclusions 
Before  the  experiments  were  conducted,  one  might 
hypothesize  certain  results  to  occur.  As  already  discussed, 
the  adaptive  methods  should  transform  an  unknown  distribution 
into  a  uniform  distribution.  It  is  to  be  expected  that  in  many 
ways,  the  performance  of  these  methods  for  skewed  distributions 
will  resemble  the  DPS  methods  in  the  uniform  case.  The  only 
exception  here  would  be  the  run  time  differences  due  to 
overheads  in  the  adaptive  methods. 

In  the  uniform  case  these  methods  will  be  distributing 
items  into  buckets,  and  on  the  first  pass,  one  might  expect  a 
certain  percentage  of  the  buckets  to  be  used.  Certainly,  all 
of  the  buckets  will  not  get  used,  and  using  combinatorial 
analysis,  the  expected  percentage  of  buckets  used  can  be  found. 

Given  one  item  and  n  buckets,  the  probability  that  the 
i-th  bucket  is  empty  is 

(^)  (1) 

For  all  n  items,  the  probability  the  i-th  bucket  is  empty 
is 

(2) 
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So  the  percentage  of  buckets  being  used  is 


100  .  (1  -  (^)") 

Since 

Lim  /  n-1 » ”  _ 

n  ♦  '  n  '  “  e 

The  expected  percentage  is 


(3) 

(4) 


100  .  {  1  -  =  63. ?1  % 

Recall  thaJ  one  of  the  concerns  of  analyzing  algorithms  is 
the  constant  of  proportionality  of  the  theoretic  order  of 
magnitude.  Determining  these  constants  based  on  the  observed 
run  times  should  help  determine  where  crossovers  might  occur, 
that  is,  at  what  input  size  one  algorithm  begins  outperforming 
another . 

Table  1.1  illustrates  the  Benchmarking  results.  The  Time 
Quicksort/Time  DPS  gives  a  percentage  of  how  much  better  DPS  is 
performing  than  Quicksort.  Comparing  results  observed  with 
those  previously  published  in  [D0B079],  (which  appear  in  the 
Expected  column )  it  can  be  seen  that  this  benchmark  of  DPS 
outperforms  previous  results  except  for  small  sample  sizes. 
Figure  IV. 1  illustrates  these  results  graphically.  Notice 
there  is  a  crossover  where  OPS  begins  to  outperform  Quicksort 
somewhere  below  ?000  items.  Sedgewick  [SE0G78]  noted  that  the 
expected  run  time  of  his  implementation  of  Quicksort  would  be 
proportional  to 

10.6P86N  log  N  +  ?.116N 
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2000 

464.25 

414.86 

1.16 

1.12 

-.04 

5000 

1302.88 

1036.80 

1.24 

1 .26 

.02 

10000 

2818.94 

2051.62 

1.27 

1.37 

.10 

15000 

4420.06 

3108.83 

1.28 

1.42 

.14 

30000  . 

9413.27 

6222.61 

1.35  * 

1.51 

.16 

50000 

1.46 

1.63  * 

.17 

In  fact. 

the 

run 

times  (in 

microseconds ) 

observed 

are 

approximately  double 

this 

f ormul a. 

Since 

DPS 

i  s 

0(n), 

one  should 

expect 

to 

fit  a  linear 

express i on 

to 

its 

run 

t imes . 

For 

these 

experiments , 

the 

express i on 

?07N 

works  very  well.  To  find  the  crossover  point,  the  equations 
are  set  equal  to  one  another  and  solved  for  N. 

2  (  10.6286N  log  H  +  2.U6N  )  -  207N 
Log  N  =.  9.5388 
N  =  744 

which  conforms  well  to  Figure  IV. 1. 

Although  Quicksort  accesses  items  directly,  and  DPS 
accesses  items  indirectly  through  a  pointer  list,  OPS  is  still 
faster.  In  reality,  one  must  consider  that  as  the  algorithms 
begin  recursing,  Quicksort  will  demonstrate  a  higher  ccgree  of 
locality  than  OPS  in  searching  for  items,  and  therefore 
generate  fewer  page  faults.  As  pointed  out  earlier,  this 

operating  system  factor  does  not  play  a  role  in  these 
experiments . 

The  first  four  sets  of  tables  to  be  presented  list  results 
of  how  well  each  algorithm  performed  on  each  of  the 

distributions.  It  is  expected  that  the  DPS  methods  perform 
worse  as  the  distribution  becomes  skewed,  and  the  adaptive 
methods  will  behave  approximately  constant. 

Tables  2.1  -  2.3  show  results  for  DPS  where  the  median  is 


chosen  to  be  (max‘*’min)/2. 


Table  ?.l  is  a  table  of  the  largest  bucket  sizes  created 
on  the  first  pass.  The  average  of  the  largest  buckets  from  the 
five  runs  and  the  maximum  bucket  size  out  of  the  five  runs  are 
listed.  As  expected,  the  sizes  get  larger  as  the  distributions 
become  more  skewed. 

Table  2.2  shows  what  percentage  of  buckets  are  used  in  the 
first  pass  through  the  data.  This  reflects  how  efficiently  the 
algorithm  is  distributing  the  data  into  buckets.  A  lower 
percentage  might  indicate  that  the  algorithm  is  doing  a  certain 
amount  of  recursion  to  handle  the  larger  bucket  sizes  being 
created.  The  first  column  is  the  percentage  of  buckets  with 
sizes  greater  than  or  equal  to  one,  and  the  second  column  is 
for  those  with  sizes  greater  than  or  equal  to  two.  As  was 
expected,  OPS  used  fewer  buckets  as  the  distribution  became 
skewed.  In  the  uniform  case  the  percentage  of  buckets  used  is 
around  63. IT  -  63. 5«.  This  collaborates  well  with  the  expected 
63. derived  earlier.  It  is  slightly  higher  here  due  to  the 
uneven  partitioning  of  bucket  intervals  as  a  result  of  the 
median  selection  and  distribution  expressions.  Interestingly, 
the  percentage  of  buckets  with  at  least  2  items  did  not  vary 
greatly,  whereas  the  percentage  of  buckets  with  at  least  one 
item  varied  between  16i{  to  64  T  throughout  the  entire  series  of 
experiments . 

Table  2.3  lists  the  run  times  observed  for  DPS(mdrg).  The 
greatest  difference  was  observed  for  the  skewed  exponential 
case,  as  would  be  expected.  The  other  distributions  were 
fairly  consistent. 


Table  ?.  OPS  (mdrg)  Experiments 
Table  ?.l 

Largest  Bucket  Sizes 


Uniform 

Normal 

Poisson 

Exponential 

Avg. 

Max . 

Avg. 

Max . 

Avg. 

Max . 

Avg. 

Max . 

500 

4.4 

5 

7.2 

8 

6.8 

8 

9.4 

12 

1000 

5.4 

6 

8.? 

11 

7.8 

9 

12.6 

14 

5000 

6.? 

7 

9.8 

11 

9.4 

10 

17.4 

19 

10000 

6.4 

8 

9.8 

11 

10.2 

13 

18.6 

21 

?0000 

6.4 

7 

10.0 

11 

10.6 

12 

21.4 

23 

30000 

7.4 

8 

10.2 

11 

10.8 

11 

23.0 

24 

Fined 

Table  2. 

Buckets 

2_ 

{First 

Pass ) 

Uniform 

Normal 

Poisson 

Exponential 

l>=i 

<>.2 

|t>-l 

%>•? 

S>-2 

500 

63.84 

26.32 

49.68 

26.80 

49.48 

28.08 

40.20 

22.76 

1000 

63.70 

26.38 

48.82 

27.00 

50.00 

27.74 

33.74 

20.68 

5000 

63.48 

26.33 

46.24 

27.10 

45.16 

27.72 

30.05 

19.42 

10000 

63.18 

26.53 

45.37 

26.95 

42.54 

27.33 

27.85 

18.34 

20000 

63.28 

26.33 

45.10 

26.98 

41.68 

27.21 

25.99 

17.44 

30000 

63.43 

26.38 

44.58 

26.92 

41.63 

27.11 

24.30 

16.64 

Table  2.3 


Run  Times  (mllllsecs) 


Uniform 

Normal 

Poisson 

Exponent  1 a1 

500 

103.65 

104.44 

104.33 

108.22 

1000 

201.54 

208.90 

208.55 

223.38 

5000 

1036.80 

1043.37 

1045.45 

1160.49 

10000 

2051.62 

2085.85 

2095.75 

2369.29 

20000 

4144.05 

4170.01 

4195.50 

4825.32 

30000 

6222.61 

6247.89 

6362.2? 

7159.07 
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Tables  3. 1-3. 3  illustrate  observations  for  OPS  which 
employs  the  Floyd-Rivest  expected  time  1.5n  exact  median 
selection  algorithm.  It  would  be  expected  that  while  the 
overall  efficiency  might  improve,  there  would  be  a  certain 

amount  of  overhead  in  run  times  associated  with  the  median 
selection. 

Overall  these  tables  demonstrate  the  same  characteristics 

Tables  ?.l-?.3  did.  There  were  two  major  differences  to  be 
noted.  Table  3.2  shows  that  while  there  was  a  tendency  for  the 
algorithm  to  distribute  items  less  efficiently  for  skewed 

distributions,  the  Poisson  data  was  slightly  more  efficient 
than  the  normal  data.  This  is  because  Poisson  generated  more 
buckets  with  exactly  size  1,  and  fewer  with  at'  least  2,  than 
the  normal  case. 

The  otljer  observation  to  be  made  is  found  in  Table  3.3. 
For  small  input  sizes,  DPS(median)  performed  better  for  the 
skewed  distributions  than  for  the  uniform  cases.  This  is 
mostly  due  to  fewer  buckets  that  need  to  be  handled  for  skewed 
data.  As  a  result,  the  algorithm  runs  slightly  better. 

Better  run  times  for  the  normal  distribution  over  the 
uniform  distribution  were  not  observed,  as  they  were  in 
[008078a]. 

In  Tables  4. 1-4. 3,  data  for  the  Ranking  Method  is 

presented.  A  concern  for  this  method  is  that  it  performs 
consistently  through  the  various  distributions. 


5-70 


Table  3.  DPS(medlan)  Experiments 


Table  3.1 


Largest 

Bucket 

Sizes 

Uniform 

Normal 

Poisson 

Exoonential 

Avg. 

Max . 

Avg. 

Max . 

Avg. 

Max . 

Avg . 

Max . 

500 

4.6 

5 

7.; 

9 

7.2 

9 

8.8 

11 

1000 

5.6 

6 

8.0 

9 

7.8 

10 

11.6 

14 

5000 

6.2 

7 

10.0 

11 

10.0 

12 

15.0 

17 

10000 

6.6 

7 

10.0 

12 

12.0 

14 

17.6 

21 

?0000 

6.8 

7 

10.4 

12 

13.8 

15 

19.8 

22 

30000 

7.2 

9 

10.6 

11 

12.2 

14 

21.8 

27 

Table  3. 

2_ 

%  of 

Filled 

Buckets 

(First  Pass) 

Uniform 

Normal 

Poisson 

Exponential 

l>sl 

%  >»1 

1>^ 

<  >•! 

500 

63.24 

26.52 

50.56 

26.96 

53.76 

25.96 

52.88 

24.92 

1000 

63.54 

26.48 

48.82 

27.24 

54.32 

26.66 

49.38 

24.14 

5000 

63.30 

26.33 

46.25 

27.01 

51.87 

25.68 

46.46 

23.05 

10000 

63.31 

26.31 

45.40 

26.96 

50.27 

25.43 

45.76 

22.74 

20000 

63.42 

26.37 

45.17 

26.85 

49.80 

25.30 

44.88 

22.14 

30000 

63.34 

26.44 

44.58 

26.90 

49.74 

25.10 

44.06 

21.64 

Table  3. 

2 

Run  Tines  ( 

mil  Usees ) 

Uniform 

Normal 

Poisson 

Exponential 

500 

145.16 

134.40 

133.66 

139.74 

1000 

257.47 

257.58 

254.11 

270.08 

5000 

1232.72 

1241.65 

1240.33 

1296.61 

10000 

2439.84 

2454.41 

2494.27 

2652.15 

20000 

4823.61 

4847.84 

4927.75 

5368.40 

30000 

7210.15 

7246.58 

7338.22 

8171.92 
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Table  4.1  conforms  well  to  this  expectation.  The  bucket 
sizes  did  not  vary  greatly  in  the  Exponential  cases  as  compared 
to  the  OPS  programs.  Table  4.?  more  vividly  shows  that  the 
algorithm  is  behaving  consistently.  For  each  distribution,  the 
percentages  remained  fairly  constant.  The  percentage  of 
buckets  with  at  least  one  item  came  very  close  to  the  predicted 
63.??.  Table  4.3  illustrates  that  the  run  times  also  behave 
very  consistently.  There  is  very  little  run  time  variance 
through  distributions.  Overall  it  can  be  concluded  that  the 
Ranking  Method  is  a  valid  Adaptive  Method  for  DPS  and  deserves 
further  consideration. 

Last  in  this  series  are  Tables  5. 1-5. 3  for  the  Cumulative 
Distribution  Function  Method.  It  can  easily  be  seen  in  these 
tables  how  well  the  algorithm  performs  across  distributions. 
Each  run  performs  equally  well  regardless  of  skewness.  This 
strongly  supports  the  theory  that  the  sample  Cumulative 
Distribution  Function  effectively  transforms  an  unknown 
distribution  into  a  uniform  distribution. 

The  next  three  tables,  6. 1-6. 3,  show  the  number  of  second 
level  passes  using  recursion  that  were  needed  for  each 
experiment.  Uniform  cases  are  not  listed  because  none  of  the 
experiments  ever  recursed  to  the  second  level.  It  should  also 
be  noted  that  none  of  the  experiments  ever  recursed  to  the 
third  level.  Two  passes  on  a  bucket  always  sufficed  to  do  the 
sorting.  • 
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Table  4.  Ranking  Experiments 
Table  4.1 

Largest  Bucket  Sizes 


500 

1000 

5000 

10000 

?0000 

30000 


Uniform  Normal  Poisson 

.  Avq.  Max.  Ayq .  tfax . 


Exponential 
Avg .  Wax. 


4.2 

5 

5.0 

6 

6.0 

7 

6.2 

7 

6.8 

8 

8.6 

12 

6.6 

7 

9.4 

13 

7.0 

8 

10.2 

11 

7.2 

8 

10.2 

11 

5.4 

6 

4.6 

5 

5.8 

7 

7.0 

10 

9.8 

13 

9.4 

13 

12.6 

15 

10.6 

17 

14.6 

17 

13.6 

21 

15.4 

17 

14.2 

23 

Table  4.2 


of  Filled  Buckets  (First  Pass’ 


Uniform 


Normal  Poisson  Exponential 

!t>-I  t>»T 


500  63.64  26.68 

1000  63.78  25.52 

5000  62.57  26.34 

10000  62.36  26.48 

20000  62.49  26.43 

30000  62.49  26.40 


64.12  25.20  61.88 
62.26  25.96  62.02 
61.70  26.20  61,55 
61.28  26.15  61.20 
61.16  26.29  61.08 
60.94  26.40  61.22 


?6.04  63.16  26.88 
26.86  62.48  25.76 
26.28  62.02  26.27 
e6.17  61.53  26.22 
26.18  61.61  26.17 
?6.05  61.51  26.18 


Uniform 

500  257.96 

1000  428.78 

5000  1789.90 

10000  3485.41 

20000  6857.82 

30000  10277.25 


Table  4.3 

Run  Times  (milllsecs 


Normal 


Poisson 


?57.94  256.20 

427.58  425.91 

1785.12  1784.10 

3479.50  3483.73 

6869.38  6879.54 

10251.06  10273.79 


258.33 

428.30 

1788.02 

3487.98 

6889.47 

10294.71 
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Table  5.  COF  Experiments 


Table  5.1 


Largest  Bucket  Sizes 


Un if orm 

Normal 

Poisson 

Exponential 

Avq. 

Max . 

III 

Max . 

Avq . 

Max . 

Avq . 

Max . 

500 

4.6 

6 

4.6 

5 

5.6 

7 

4.8 

5 

1000 

5.6 

7 

5.6 

7 

5.2 

6 

5.2 

6 

5000 

6.8 

7 

6.4 

7 

6.6 

7 

6.4 

8 

10000 

6.8 

8 

6.8 

7 

6.4 

7 

6.6 

7 

20000 

7.0 

7 

7.8 

9 

7.4 

8 

7.4 

8 

30000 

7.4 

8 

7.4 

9 

8.0 

10 

8.0 

9 

$  of 

Filled 

Table  5. 

8uckets 

2. 

(First  Pass) 

Un i f orm 

Normal 

Poisson 

Exponential 

it>«i 

?>.2 

*>=1 

*>=2 

%  >  =  1 

%  >“2 

V 

H 

%>.2 

500 

63.9? 

26.60 

64.36 

25.9? 

62.60 

26.00 

63.80 

26.24 

1000 

63.32 

26.66 

63.08 

26.08 

62.08 

27  .00 

63.02 

26.30 

5000 

62.61 

26.20 

62.43 

26.4? 

62.21 

26.54 

62.23 

26.34 

10000 

62.43 

26.17 

62.30 

26.46 

62.25 

26.49 

62.08 

26.59 

20000 

62.35 

26.30 

62.29 

26.38 

62.13 

26.62 

62.02 

26.76 

30000 

62.50 

26.27 

62 . 09 

26.60 

62.08 

26.6? 

61.89 

26.74 

Table  5.3 

Run  Times  (roilllsecs) 

Uniform 

Normal 

Poisson 

Exponential 

500 

127.37 

127.53 

127.50 

127.46 

1000 

234.27 

233.86 

233.70 

233.93 

5000 

1085.52 

1083.71 

1084.46 

1084.25 

10000 

2148.87 

2145.85 

2148.11 

2147.49 

20000 

4274.07 

4265.47 

4274.10 

4272.52 

30000 

6398.65 

6378.67 

6399.72 

6398.81 

A  couple  of  observations  can  be  made  about  the  nature  of 
the  data  presented  In  these  tables.  The  number  of  second  level 
passes  Is  a  good  indication  of  how  much  work  an  algorithm  Is 
doing.  The  fewer  passes,  the  less  work  being  performed.  The 
number  of  second  level  passes  is  a  direct  result  of  how  well 
the  data  was  distributed  In  the  first  pass.  The  results  of 
these  tables  compare  well  with  the  run  times  observed  In  Tables 
7.0-10.0. 

As  expected,  the  number  of  second  level  passes  Increased 
within  an  algorithm  as  the  data  became  more  skewed.  Another 
observation  is  that  Ranking  needed  only  a  small  number  of 
passes,  and  CDF  did  not  use  a  second  level  of  recursion  except 
In  one  experlmenti  In  this  respect,  COF  far  outperformed  the 
other  algorithms.  Again,  this  further  supports  the  theory  of 
the  Cumulative  Distribution  Function  acting  as  a  uniform 
transformation.  This  Is  true  to  a  lesser  extent  for  the 
Ranking  algorithm. 

The  next  four  series  of  tables  present  how  the  algorithms 
performed  In  any  one  distribution.  These  are  especially 
helpful  on  showing  how  the  algorithms  are  competing  against  one 
another . 

The  first  three  tables,  7. 1-7. 3,  show  the  Uniform  case. 
As  can  be  seen  In  7.1  and  7.?,  all  algorithms  appear  to  be 
performing  equally  well  with  respect  to  the  uniform  case. 
However,  Table  7.3  shows  the  first  large  discrimination  between 
the  methods.  The  second  column  of  figures  In  these  run  times 
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Table  6 


Number  of  Second  Level  Recursions 


Table  6«1 


Normal 


OPS 

mdrg 

DPS 

med i an 

Rank i no 

COF 

Avg . 

Em 

Max. 

Avg. 

Max . 

Avg.  Max. 

500 

0 

0 

0 

0 

0 

0 

0  0 

1000 

.? 

1 

0 

0 

0 

0 

0  0 

5000 

1 .? 

2 

1.0 

2 

.2 

1 

0  0 

10000 

1  .4 

4 

1.0 

3 

.4 

1 

0  0 

?0000 

2.2 

5 

2.0 

3 

1.8 

5 

0  0 

30000 

2  .0 

3 

3.4 

5 

1  .4 

3 

0  0 

Table  6.2 


Poisson 


OPS 

mdrg 

OPS 

median 

Ranking 

COF 

Avg . 

Max . 

Em 

Max . 

Avg.  Max. 

Avg7~Hax . 

500 

0 

0 

0 

0 

0  0 

0  0 

1000 

0 

0 

.2 

1 

0  0 

0  0 

5000 

.6 

1 

1.2 

3 

.6  1 

0  0 

10000 

1.8 

4 

7.6 

13 

1 .8  3 

0  0 

?0000 

3.0 

7 

15.8 

25 

4.6  7 

0  0 

30000 

4.6 

7 

24.4 

36 

6.0  13 

.2  1 

Table  6.3 

Exponent i al 

OPS 

mdrg 

OPS 

med i an 

Ranking 

CDF 

Avg. 

Max . 

Avg , 

Max . 

Avg.  Max. 

AvgTHax . 

500 

1.? 

3 

1 .2 

3 

0  0 

0  0 

1000 

7.2 

14 

2.4 

5 

.2  1 

0  0 

5000 

69.6 

9? 

27.0 

43 

.8  3 

0  0 

10000 

171.? 

?58 

74.6 

113 

2.6  9 

0  0 

?0000 

431.4 

519 

195.2 

237 

6.6  23 

0  0 

30000 

763,0 

786 

351 .6 

363 

11.8  31 

0  0 

5-76 


represent  the  percentage  Improvement  of  DPS(mdrg)  over  the 
given  algorithm.  For  example,  a  1.19  means  that  DPS(mdrg)  runs 
19*  faster  than  the  given  algorithm  in  that  experiment.  The 
conclusion  to  be  reached  from  Table  7.3  is  that  as  the  sample 
^ets  larger,  DPS{mdrg)  is  about  16*  faster  than  OPS(median), 
65*  faster  than  Ranking,  and  3*  faster  than  COF.  The  Uniform 
experiment  times  are  represented  graphically  in  Figure  IV.?. 
(Since  the  Normal  and  Poisson  experiments  have  relatively  the 
same  proportions  as  Uniform,  as  seen  in  Tables  8  and  9,  this 
graph  would  be  similar  in  those  distributions  as  well.) 

The  reason  for  these  time  differences  can  be  explained  in 
the  overhead  associated  with  each  method  as  compared  to 
OPS(mdrg).  Formulas  can  be  fit  to  the  run  times  to  approximate 
what  the  constants  of  proportionality  are.  These  expressions 
yield  times  in  microseconds,  and  fit  better  as  the  sample  size 
increases . 

T ime  (microseconds)  Space 

DPS(mdrg)  ?07N  ?N 

COF  ?07N  ♦  5.6N  ♦  ??600  ?N  +  3M 

OPS(median)  ?07N  +  31. 6N  +  5?000  ?N  +  N 

Ranking  ?07N  ♦  132N  ♦  96255  ?N  ♦  M  +  m 

N  «  #  items,  M  >  ^  cells,  m  >  sample  size 

CDF  Overhead;  Sample  frequency,  line  fits,  and 
larger  partitioning  expression 
OPS  Overhead:  Median  selection 
Ranking  Overhead:  Quicksort,  Binary  search,  and 

complex  partitioning  expression 
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Table  7.  Uniform  Experiments 
Table  7.r 


DPS 

Avg. 

mdr  q 
Max 

Largest  Bucket  Sizes 

OPS  median  Ranking 

,  Avg.  Max.  Avg.  Max. 

CDF 

Avg.n^ax . 

500 

4.4 

5 

4.6 

5 

4.2 

5 

4.6 

6 

1000 

5.4 

6 

5.6 

6 

6.0 

7 

5.6 

7 

5000 

6.? 

7 

6.2 

7 

6.8 

8 

6.8 

7 

10000 

6.4 

8 

6.6 

7 

6.6 

7 

6.8 

8 

?0000 

6.4 

7 

6.8 

7 

7.0 

8 

7.0 

7 

30000 

7.4 

8 

7.2 

9 

7.2 

8 

7.4 

8 

Table 

7.2 

1- 

of  Filled  Buckets  (First  Pass) 

OPS  mdrg 

DPS  median 

Ranking 

CDF 

%>=1 

1>^ 

1>^ 

500 

63.84 

26.32 

63.24 

26.52 

63.64 

26.68 

63.92 

26.60 

1000 

63.70 

26.38 

63.54 

26.48 

63.78 

25.52 

63.32 

26.66 

5000 

63.48 

26.33 

63.30 

26.33 

62.57 

26.34 

62.61 

26.20 

10000 

63.18 

26.53 

63.31 

26.31 

62.36 

26.48 

62.43 

26.17 

?0000 

63. ?8 

26.33 

63.42 

26.37 

62.49 

26.43 

62.35 

26.30 

30000 

63.43 

26.38 

63.34 

26.44 

62.49 

26.40 

62.50 

26.27 

Table 

7.3 

Run  Times  (millisecs) 


DPS  mdrg 

OPS  median 

Rank inc 

1 

CDF 

500 

103.65 

145.16 

5med 

Omdrg 

1 .40 

f 

257.96 

tanking 

Dmdrg 

2.49 

127.37 

CDF 

Omdrg 

1.23 

1000 

201.54 

257.47 

1 .28 

428.78 

2.18 

234.27 

1.16 

5000 

1036.80 

1232.72 

1.19 

1789.90 

1.73 

1085.52 

1 .05 

10000 

2051.62 

2439.84 

1.16 

3485.41 

1.70 

2148.87 

1.05 

20000 

4144.05 

4823.61 

1.16 

6857.82 

1.66 

4274.07 

1 .03 

30000 

6222.61 

7210.15 

1.16 

10277.25 

1 .65 

6398.65 

1.03 
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niiii: 


DPSm 


20  30 
input  size  (thousands) 


9 


/ 


It  is  here  where  the  importance  of  constants  of  proportionality 
in  orders  of  magnitude  is  truly  appreciated.  Although  the 
overhead  for  Ranking  is  very  high  due  to  the  binary  searching 
and  initial  sorting,  only  about  an  estimated  2Q%  can  be  saved 
on  the  run  time  if  a  smaller  sample  size  and  15  sampling  cells 
are  used. 

It  now  becomes  interesting  to  see  what  happens  as  the  data 
becomes  more  skewed.  Tables  8. 1-8. 3  describe  the  Normal  case. 
Table  8.1  shows  that  the  adaptive  methods  have  smaller  bucket 
sizes.  Table  8.2  illustrates  how  efficiently  the  adaptive 
methods  distribute  the  items  for  buckets  with  at  least  1  Item. 
The  efficiency  is  better  by  roughly  15JK.  Table  8.3  shows  that 
the  run  times  are  in  the  same  proportion  as  they  were  for  the 
uniform  case. 

Tables  9. 1-9. 3  illustrate  the  Poisson  experiments.  The 
results  here  are  much  like  those  of  the  Normal  experiments  and 
the  same  conclusions  can  be  reached. 

Tables  10.1-10.3  list  the  results  of  the  experiments  with 


an  Exponential 

d  i  str i but i on . 

Table  10.1 

shows 

that 

the 

adaptive  methods 

outperform  the 

OPS  methods. 

and 

Table 

10.2 

shows  that  the 

adaptive  methods 

distribute 

items 

much 

more 

efficiently.  However,  the  major  conclusion  to  be  reached  is  In 
Table  10.3.  For  input  sizes  greater  than  about  2500,  COF 
outperforms  the  DPS(mdrg)  algorithm.  For  20000  to  30000  items, 
it  runs  about  \7%  better.  Figure  IV. 3  illustrates  the  data  in 
Table  10.3 . 
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Table  8.  Normal  Experiments 
Table  8.1 


Largest  Bucket 


SI  zes 
— « — 


DPS  mdrq 
Av~q.  Wax , 


500 

1000 

5000 

10000 

?0000 

30000 


7.? 

8.2 

9.8 

9.8 

10.0 

iO.? 


8 

11 

11 

11 

11 

11 


OPS 

7.? 

8.0 

10.0 

10.0 

10.4 

10.6 


med i an 


Max. 

9 

9 

11 

12 

12 

11 


Rank Inq 
A^vg.  Max. 


CDF 

Avq.  Max. 


5.0 

6.2 

8.6 

9.4 

10.2 

10.2 


6 

7 

12 

13 

11 

11 


4.6 

5.6 

6.4 
6.8 
7.8 

7.4 


5 

7 

7 

7 

9 

9 


Table  8.2 

%  of  Filled  Buckets  (First  Pass) 


DPS  mdrq 
it>»l  t>.2 

500  49.68  26.80 
1000  48.82  27.00 
5000  ‘46.24  27.10 
10000  45.37  26.95 
20000  45.10  26.98 
30000  44.58  26.92 


OPS  median 

Ranking 

CDF 

1^ 

l>li 

i>Er 

S>-2 

t>ml 

~<>-2 

50.56 

26.96 

64.12 

25.20 

64.36 

25.92 

48.82 

27.24 

62.26 

25.96 

63.08 

26.08 

46.25 

27.01 

61.70 

26.20 

62.43 

26.42 

45.40 

26.96 

61.28 

26.15 

62.30 

26.46 

45.17 

26.85 

61.16 

26.29 

62.29 

26.38 

44.58 

26.90 

60.94 

26.40 

62.09 

26.60 

Table  8.3 

Run 

Times 

(mini 

secs ) 

OPS  mdrq 


OPS  median 

Praed 


Rankin 


CDF 


-  .  anking 

Dmdr g  Dmdrg 


CDF 

Dmdrg 


500 

104.44 

134.40 

1.29 

1000 

208.90 

257.58 

1.23 

5000 

1043.37 

1241.65 

1.19 

10000 

2085.85 

2454.41 

1.18 

20000 

4170.01 

4847.84 

1.16 

30000 

6247.89 

7246.58 

1.16 

257.94  2.47  127.53  1.22 

427.58  2.05  233.86  1.12 

1785.12  . 1.71  1083.71  1.04 

3479.50  1.67  2145.85  1.03 

6869.38  1.65  4265.47  1.02 

10251.06  1.64  6378.67  1.02 
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Table  9.  Poisson  Experiments 
Table  9.1 


DPS  mdrg 

DPS 

median 

Ranlc  i  ng 

CDF 

Avq . 

Max . 

EE 

Max . 

Max . 

Avq.  Wax. 

500 

6.8 

8 

7.2 

9 

5.4 

6 

5.6 

7 

1000 

7.8 

9 

7.8 

10 

5.8 

7 

5.2 

6 

5000 

9.4 

10 

10.0 

12 

9.8 

13 

6.6 

7 

10000 

10.2 

13 

12.0 

14 

12.6 

15 

6.4 

7 

20000 

10.6 

12 

13.8 

15 

14.6 

17 

7.4 

8 

30000 

10.8 

11 

12.2 

14 

15.4 

17 

8.0 

10 

Table  9.2 


%  of  Filled  Buckets  ( First  Pass) 


OPS  1 

ndrq 

OPS  median 

Ranking 

CDF 

7^2 

!«>=2 

^>=1 

%>^2 

II 

A 

■'?>=2 

500 

49.48 

28.08 

53.76 

25.96 

61.88 

26.04 

62.60 

26.00 

1000 

50.00 

27.74 

54.32 

26.66 

62.02 

26.86 

62.08 

27.00 

5000 

45.16 

27.7? 

51.87 

25.68 

61.55 

26.28 

62.21 

26.54 

10000 

42  .54 

27.33 

50.27 

25.43 

61.20 

26.17 

62.25 

26.49 

20000 

41.68 

27.21 

49.80 

25.30 

61.08 

26.18 

62.13 

26.62 

30000 

41.63 

27.11 

49.74 

25.10 

61.22 

26.05 

62.08 

26.62 

Table  9.3 

Run  Times  (millisecs) 


DPS  mdrg  DPS  median  Rank i ng  CDF 

Pmed  Rank i ng  CDF 

Dmdrg  Dmdr g  Dmdrg 


500 

104.33 

133.66 

1 .28 

256.20 

2.46 

127.50 

1.22 

1000 

208.55 

254.11 

1.22 

425.91 

2.04 

233.70 

1.12 

5000 

1045.45 

1240.33 

1.19 

1784.10 

1.71 

1084.46 

1 .04 

10000 

2095.75 

2494.27 

1.19 

3483.73 

1.66 

2148.11 

1.02 

20000 

4195.50 

4927.75 

1.17 

6879.54 

1.64 

4274.10 

1 .02 

30000 

6362.22 

7338.22 

1.15 

10273.89 

1.61 

6399.72 

1.006 
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Table  10.  Exponential  Experiments 


Table  10.1 


Largest  Bucket  Sizps 


DPS  mdra 

OPS  median 

Ranking 

CDF 

Avg 

1  . 

Ava-L 

Max . 

Avg.  Max. 

Avg.  Max. 

500 

9.4 

12 

8.8 

11 

4.6  5 

4.3 

5 

1000 

12.6 

14 

11.6 

14 

7.0  10 

5.2 

6 

5000 

17.4 

19 

15.0 

17 

9.4  13 

6  .4 

8 

10000 

18.6 

21 

17.6 

21 

10.6  17 

6 . 6 

7 

20000 

21.4 

23 

19.8 

22 

13.6  21 

7 .4 

8 

30000 

23.0 

24 

21.8 

27 

14.2  23 

8.0 

9 

Tab 

le  10 

i. 

of  Filled  Buckets 

_(First  Pass) 

DPS 

mdrg 

OPS  median 

Ranking 

CDF 

JET 

^>.1  iEI 

=  r  i?>=2 

i  l>f 

500  40.20  22.76 
1000  33.74  20.68 
5000  30.05  19.42 
10000  27.85  18.34 
20000  25.99  17.44 
30000  24.30  16.64 


52, 

,88 

24, 

.92 

63, 

.16 

49, 

.38 

24, 

.14 

62, 

.48 

46. 

.46 

23, 

.05 

62. 

.02 

45. 

.76 

22 

.74 

61 

.53 

44 

.88 

22 

.14 

61 

.61 

44 

.06 

21 

.64 

61 

.51 

26.88  63.80  26.24 

25.76  63.02  26.30 

26.27  62.23  26.34 

26.22  62.08  26.59 

26.17  62.02  26.76 

26.18  61.89  26.74 


Table  10.3 


OPS  mdrq 


OPS  medi 


an 

~~ffmed 

Omdirg 


Ranking 


ank i ng 
Omdrg 


CDF 

CDF  Dmdrg 
Dmdrg  COF 


500 

108.22 

139.74 

1.29 

1000 

223.38 

270.08 

1.21 

5000 

1160.49 

1296.61 

1.12 

10000 

2369.29 

2652.15 

1.12 

20000 

4825.32 

5368.40 

1.11 

30000 

7159.07 

8171.92 

1.14 

258.33 

2.39 

127.46 

1.18 

428.30 

1.92 

233.93 

1.05 

1.07 

1788.02 

1.54 

1084.25 

.93 

3487.98 

1.47 

2147.49 

.91 

1.10 

6889.47 

1.43 

4272.52 

.89 

1.13 

10294.71 

1  ,44 

6398.81 

.89 

1.12 
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Tables  11.1-11.3  demonstrate  how  well  the  algorithm  run 
against  DPS(mdrg)  across  the  distributions.  It  Is  interesting 
to  note  the  consistency  of  the  percentages  across 
distributions.  Tables  11.1-11.3  are  Illustrated  graphically  In 
F i gure  IV .4  . 

IV.?  Summary  of  Conclusions 

The  reader  may  have  noticed  that  up  to  now  experiments 
have  dealt  with  various  types  of  distributions,  and  little  has 
been  done  with  the  worst  case.  An  exp..nent1a1  distribution 
exemplifies  a  typical  bad  case  of  data  for  DPS.  Where  the 
worst  case  for  Quicksort  is  a  realistic  sorted  set  of  Items, 
the  worst  case  for  OPS  is  an  Impractical  set  of  factorials 
[00B079].  This  Is  by  no  means  a  typical  case.  For  these 
reasons  this  author  feels  that  worst  case  experimentation  Is 
justified  only  as  a  curiosity  factor,  rather  than  of  any 
practical  importance.  The  adaptive  methods  have  more  than 
proved  themselves  on  the  skewed  distributions  given  to  them  as 
Input . 

It  was  pointed  out  at  the  beginning  of  this  paper  that  an 
algorithm  should  be  measured  on  a  number  of  criteria.  Thus 
far,  the  algorithms  have  been  thoroughly  analyzed  for 
theoretical  and  practical  time  and  space  considerations. 
Additionally,  they  should  be  easily  understood,  Implemented, 
and  maintained.  DPS(medlan)  has  a  very  complex  median 
selection  algorithm,  and  Ranking  has  a  lengthy  Initial 
Quicksort  and  a  cumbersome  binary  search  to  execute.  The  CDF 
algorithm,  on  the  other  hand.  Is  quite  simple  minded  In  Its 
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Table  11 


Run  Time  Percentages 


These  tables  indicate  how  much  longer  it  takes  an  algorithm  to 
run  for  the  given  distribution  as  compared  to  OPS  mdrg  on  that 
distribution. 

Table  11.1 


OPS  median 


DPSmdr g 

DPSmdr g 

DPSmdr g 

DPSmdrg 

Un i f orm 

Normal 

Normal 

Poisson 

Poisson 

Exp.  Exp. 

500 

103.65 

104.44 

1.29 

104.33 

1.28 

108.22  1.29 

1000 

201.54 

208.90 

1 .23 

208.55 

1.22 

223.38  1.21 

5000 

1036.80 

1043.37 

1.19 

1045.45 

1.19 

1160.49  1.12 

10000 

2051.62 

2085.85 

1.18 

2095.75 

1.19 

2369.29  1.12 

20000 

4144.05 

4170.01 

1.16 

4195.50 

1.17 

4825.32  1.11 

30000 

6222.61 

6247.89 

1.16 

6362.22 

1.15 

7159.07  1.14 

Table  11. 

_2 

Rank i nq 

Normal 

Poisson 

Exponent i a] 

500 

2.47 

2.46 

2  .39 

1000 

2.05 

2.04 

1 .92 

5000 

1.71 

1.71 

1.54 

10000 

1 .67 

1.66 

1.47 

20000 

1 .65 

1 .64 

1.43 

30000 

1 .64 

1.61 

1 .44 

Table  11 

lI 

CDF 

Normal 

Poisson 

Exponential 

500 

1 .22 

1 .22 

1.18 

1000 

1.12 

1.12 

1.05 

5000 

1.04 

1.04 

.93  (1.07) 

10000 

1 .03 

1 .02 

.91  (1.10) 

20000 

1.02 

1.02 

.89  (1.13) 

1 

30000 

1 .02 

1.006 

.89  (1.12) 

1 
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Figure  IV. ^ 

Run  Time  Percentages 
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approach,  which  makes  It  readily  comprehensible.  This 
simplicity  lends  Itself  to  competitive  run  times  with  OPS(mdrg). 

For  larger  Input  sizes,  the  CDF  algorithm  runs  to  within 
4*  of  DPS(mdrg),  and  actually  outperforms  it  by  \2%  In 
exponential  and  more  skewed  cases.  Smaller  Inputs  take  only  a 
fraction  of  a  second  to  sort,  so  overhead  is  not  an  Important 
consideration  here.  When  using  CDF,  we  are  guaranteed  that  any 
unknown  distribution  will  be  sorted  as  quickly  and  efficiently 
as  though  it  were  a  uniform  case,  and  the  sorting  can  be  done 
about  as  cheaply  as  the  fastest  available  DPS  method. 
Therefore,  there  Is  little  to  lose,  and  possibly  something  to 
gain,  by  Implementing  the  Cumulative  Distribution  Function 
Adaptive  Method  for  Distributive  Partitioning  Sorting.  It  Is 
well  worth  using. 


CHAPTER  V 


CONSIDERATIONS  FOR  THE  FUTURE 

With  the  Cumulative  Distribution  Function  adaptation. 
Distributive  Partitioning  Sorting  is  an  extremely  efficient  and 
valuable  sorting  technique.  It  easily  outperforms  Quicksort 
and  other  "fast"  sorting  algorithms.  But  there  remain  a  number 
of  aspects  in  which  DPS  may  be  even  further  improved,  and  a 
number  of  areas  in  which  it  has  future  implications. 

V.l  Modifications 

Some  modifications  can  be  suggested  to  improve  the 
efficiency  of  OPS.  If  DPS(mdrg)  or  DPS(median)  is  used  knowing 
that  the  data  will  typically  be  symmetrically  distributed,  then 
it  is  not  necessary  to  select  a  median.  All  that  is  needed  is 
to  partition  the  range  into  n  buckets  and  distribute  the 
items.  The  median  is  so  close  to  the  mean  for  these 
distributions  that  it  is  not  worth  finding  or  using.  If  one 
insists  on  choosing  a  median  quickly,  it  would  be  sufficient  to 
choose  the  median  of  a  small  sample  of  the  data,  rather  than 
the  entire  data  set. 

COFOPS  does  not  choose  a  median.  But  like  the  other 
methods,  it  has  an  Insertionsort  cutoff.  For  these  experiments 
Insert i onsort  was  used  for  bucket  sizes  of  9  or  less.  Since 
about  63.?  t  of  the  buckets  are  used,  it  would  be  practical  to 
use  some  fraction  of  the  partitions.  This  is  because  many 
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buckets  with  one  item  can  be  combined,  and  still  come  under  the 
cutoff.  The  same  basic  idea  was  suggested  in  [KNUT73, 
00B079].  It  would  be  worthwhile  to  seg  if  there  is  an  optimum 
number  of  buckets  to  use  given  the  cutoff.  This  could  result 
in  a  substantial  space  savings. 

V.?  Implications 

As  is  the  case  with  other  sorting  algorithms,  there  is 
some  question  as  to  whether  DPS  is  practical  for  machines  other 
than  large  mainframes.  On  microcomputers,  if  large  inputs  are 
used,  the  answer  is,  of  course,  no,  due  to  memory  size 
limitations  and  slow  processing  speeds.  But  with  the  recent 
advances  in  mass  storage  and  CPU  speeds  on  micros,  it  might  not 
be  long  before  large  scale  programs  become  reality  on  small 
computers . 

There  are  practical  space  problems  on  minicomputers  as 
well,  which  are  largely  a  function  of  the  amount  of  available 
space  and  the  load  on  the  machine.  Theoretically  there  is  no 
reason  why  DPS  could  not  be  implemented  on  a  mini.  In  reality, 
in  addition  to  OPS's  space  overhead,  there  would  be  many  system 
and  user  dependent  factors  affecting  its  performance.  At  some 
point  it  may  become  advantageous  to  resort  to  an  external  sort 
should  system  resources  become  too  limited. 

It  would  be  very  worthwhile  to  examine  adapting  OPS  to 
handle  alphanumeric  keys.  This  would  be  of  great  practical 
concern  for  the  data  processing  community,  since  most  sorting 
in  reality  is  done  on  name  fields  of  one  type  or  another.  The 
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main  concern  would  be  to  keep  OPS  fast,  simple,  and  competitive 
with  other  algorithms. 

It  is  fitting  here  to  cite  ^previous  work  in  the 
applications  of  the  idea  of  distributive  partitioning.  Just  as 
the  basic  idea  of  partitioning  in  Quicksort  was  used  by  Floyd 
for  selection,  so  Allison  and  Noga  have  suggested  using 
distributive  partitioning  in  selection  [ALLI80].  Van  der  Nat 
has  suggested  adapting  distributive  partitioning  in  binary 
merging  and  merge  sorting  applications  [VAN79,  VAN80].  And,  as 
mentioned  earlier,  Meijer  and  Akl  have  developed  a  Hybrid  of 
OPS  which  uses  a  COF  for  known  distributions  [MEIJ80]. 

COFOPS  could  be  generalized  to  sort  n-dimensioned  arrays. 

A  CDF  in  n-dimensions  is  defined  to  be: 

Gj((Xj,x^,...,x^).P(Xj  <  x^,X^  <  x^,...,X^  <  x^) 

Finding  cumulative  frequency  probabilities  is  easily  expanded 
to  n-dimensions.  ;Iince  this  function  can  be  considered 
monotonic  increasing  in  n-dimensions,  the  resulting 
transformation  from  one  n-dimensional  spac<>  to  another  will  be 
uniform.  It  should  be  relatively  easy  to  implement  CDFDPS  for 
multi-dimensioned  arrays. 

Perhaps  another  way  to  use  the  basic  idea  of  CDF 
distributive  partitioning  is  in  hashing  applications.  An  item 
can  be  hashed  using  distributive  partitioning  for  fast  lookup 
and  retrieval  in  databases.  Collisions  could  be  handled  in  any 
number  of  ways  described  in  database  theory.  The  hope  is  that 
a  very  fast  and  simple  mechanism  can  be  developed  for  information 


5-91 


storage  and  retrieval  systems.  The  hashing  process  would  of 
course  be  0(1)1 


V.3  In  Conclusion. .  . 

Distributive  Partitioning  Sorting  has  only  recently  begun 
to  receive  the  attention  it  deserves.  With  the  Cumulative 
Distribution  Function  adaptation,  it  can  be  made  to  handle  all 
types  of  unknown  distributions  equally  well.  The  space 
considerations  can  also  be  minimized  as  can  the  run  times. 
Since  DPS  is  practical,  fast,  and  easy  to  implement,  serious 
consideration  should  be  given  to  it  by  the  programming 
community  as  a  viable  and  cost  effective  sorting  method. 
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ALGORITHMIC  COMPLEXITY 
Part  6 


by 
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EXPECTED  BEHAVIOR  OF  APPROXIMATION  ALGORITHMS 
FOR  the  EUCLIOEAN  TRAVELING  SALESMAN  PROBLEM 

Abstract 

The  behavior  of  several  approximation  algorithms  for  the 
traveling  salesman  problem  is  considered  when  the  points  are 
randomly  allocated  in  the  Euclidean  plane  according  to  some 
known  distribution.  The  expected  length  of  the  tour  constructed 
by  an  algorithm  is  estimated  from  the  order  statistics  of  the 
distribution  of  the  distance  between  points.  The  approximation 
methods  considered  include  nearest  neighbor,  arbitrary  insert, 
nearest  and  cheapest  insert,  and  two  methods  based  on  finding 
the  minimal  spanning  tree  (including  Christof ides'  algorithm). 
For  the  distribution  examined,  all  of  the  approximations  are 
shown  to  produce  a  tour  whose  expected  length  is  0(i/n‘),  where  n 
is  the  number  of  points,  and  at  most  a  small  constant  factor 
(ranging  from  25.7%  to  87.5%)  from  optimal. 
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No.  F3060?-79-C-0124. 
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Salt  Lake  City,  Utah  84103. 
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EXPECTED  BEHAVIOR  OF  APPROXIMATION  ALGORITHMS 
FOR  THE  EUCLIDEAN  TRAVELING  SALESMAN  PROBLEM 


In  this  paper,  several  simple  polynomial  time  approximation 
algorithms  for  the  Traveling  Salesman  Problem  (TSP)  are  analyzed 
for  their  expected  performance  when  the  points  are  distributed  in 
two-dimensional  Euclidean  space.  This  version  of  the  TSP  may  be 
briefly  stated  as  follows. 

Given  a  set  of  points  in  a  plane,  find  the  minimum 

length  tour  going  through  each  point  exactly  once.. 

This  problem  has  a  long  and  interesting  history,  and  many 
attempts  at  its  solution  are  surveyed  in  Bellmore  and  Nemhauser 
[?].  Recently,  Garey,  Graham,  and  Johnson  [7]  and  Papadimitriou 
[15]  have  independently  shown  that  the  Euclidean  TSP  is 
NP-complete,  and  thus  it  appears  that  an  exact  solution  to  the 
problem  for  more  than  several  points  is  computat ional 1y 
infeasible.  As  a  result,  much  recent  interest  has  centered 
around  the  behavior  of  approximation  algorithms,  or  heuristics, 
for  this  problem. 

Rosenkrantz,  Stearns,  and  Lewis  [17]  have  investigated  the 
worst  case  performance  of  a  number  of  approximation  methods  for 
the  TSP.  The  tenor  of  their  work  is  to  examine  a  specific 

algorithm  and  bound  the  ratio  of  the  length  of  the  approximate 
tour  it  produces  to  that  of  the  optimal  tour.  They  also  attempt 
to  construct  graphs  for  which  the  algorithm  performs  nearly  as 
badly  as  this  ratio  might  imply.  The  best  known  guaranteed 
approximation  algorithm  for  the  TSP  is  due  to  Christofides  [3], 
and  always  finds  a  tour  whose  length  is  within  a  factor  of  1^ 
times  the  optimal  solution. 
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Worst  case  performance  analysis  provides  a  warning  to  users 
of  an  algorithm  how  far  from  the  optimum  the  method  might 
deviate.  Unfortunately,  results  of  this  nature  provide  little 
or  no  insight  as  to  the  typical  behavior  of  the  method.  The 
algorithm  with  the  best  worst  case  ratio  does  not  necessarily 
have  the  best  expected  one.  The  expected  performance  of  an 
algorithm  is  usually  more  difficult  to  ascertain.  One  has  to 
make  assumptions  about  the  distributions  of  Inputs,  and 
realistic  assumptions  are  often  mathematically  intractable. 
Even  the  introduction  of  slightly  complex  heuristics  can  lead  to 
probabilistic  dependencies  that  can  be  extremely  difficult  to 
analyze. 

In  this  paper,  we  investigate  the  expected  length  of  the 
solution  to  an  n-point  TSP  when  the  points  are  randomly 
allocated  in  the  plane  according  to  some  given  probability 
distribution.  Using  techniques  from  order  statistics,  we  examine 
the  following  approximation  algorithms: 

.  nearest  neighbor  method 
.  arbitrary  Insert  method 
.  nearest  and  cheapest  Insert  methods 
.  minimal  spanning  tree  (MST)  based  method 
.  Chrisof ides'  method 

All  of  these  methods  are  found  to  produce  a  tour  whose  expected 
length  Is  0(/Tr).  We  also  bound  the  expected  tour  length  from 
below  to  show  that  the  algorithms  are  optimal  to  within  at  most 
a  small  constant  factor.  These  results  tend  to  confirm 
experimental  work  In  actually  using  the  algorithms  [9].  Further- 
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■ore,  the  results  are  significant  In  that  the  worst  case 
performances  of  some  of  the  algorithms  studied  can  vary  greatly, 
as  shown  by  Rc '.enkrantz ,  et  al  [17].  The  nearest  and  cheapest 
Insert  and  HST-based  methods  always  produce  a  tour  whose  length 
Is  at  most  twice  that  of  the  optimum,  but  the  best  known  upper 
bounds  on  the  worst  case  ratio  for  the  nearest  neighbor  and 
arbitrary  Insert  methods  grow  as  log  n.  In  fact.  It  has  been 
further  shown  that  this  logarithmic  divergence  Is  unavoidable 
for  the  nearest  neighbor  algorithm. 

Some  prior  related  work  has  been  done  on  the  problem  studied 
In  this  paper.  Employing  techniques  quite  different  from  those 
used  here,  Morozinskll  [14]  has  shown  that  the  expected  length 
of  a  tour  constructed  by  the  arbitrary  Insert  method  is  0(/n) 
and  within  a  factor  of  4  of  a  lower  bound  on  the  expected  tour 
length.  His  result  is  quite  general  in  the  sense  that  it  does 
not  assume  any  specific  distribution  of  points,  but  only  some 
weak  conditions  about  the  way  they  are  generated.  Although  our 
results  apply  only  to  the  specific  distribution  considered,  our 
bounds  yield  more  concrete  Information  about  the  actual  tour 
length.  Furthermore,  the  techniques  used  In  our  derivations  are 
general  and  could  be  applied  to  other  distributions. 

In  two  frequently  cited  papers,  Karp  [10,11]  describes  an 
algorithm  based  on  dividing  the  points  Into  a  number  of  small 
regions,  constructing  an  optimum  tour  within  each  region,  and 
then  Joining  the  subtours.  Although  this  algorithm  Is  not 
guaranteed  to  find  a  tour  within  any  specified  range  of  the 
optimum,  Karp  states  that  the  method  solves  the  problem  to  within 
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l+«  "almost  everywhere"  for  every  e>0.  The  success  of  Karp's 
algorithm  depends  on  a  theorem  (with  a  long  and  difficult  proof) 
by  Beardwood,  Halton,  and  Hammersley  [1].  This  result  states 
that  the  length  of  the  optimal  tour  through  n  points  in  a 
bounded  plane  region  of  area  A  Is  "almost  always”  proportional 
to  /nTT  for  sufficiently  large  n.  Ueide  [18]  has  recently  pointed 
out  that  some  confusion  exists  when  interpreting  and  comparing 
such  results  due  to  differences  In  (1)  the  probabilistic  models 
under  which  they  are  derived,  and  (2)  the  measures  of  convergence 
used.  The  results  of  Karp  and  Beardwood,  et  al  are  proved  within 
what  Ueide  calls  the  "Incremental  model"  —  i.e.,  the  n-th 
Instance  of  the  problem  differs  only  incrementally  from  the 
previous  one.  Our  results  are  proved  for  the  "independent 
model".  In  which  the  n-th  problem  In  the  sequence  Is  totally 
Independent  of  previous  ones.  Ueide  has  shown  that  results  for 
the  Independent  model  are  stronger  In  the  sense  that  they 
subsume  results  for  the  Incremental  model,  while  the  reverse 
does  not  always  hold.  Another  difficulty  with  the  results  of 
Karp  and  Beardwood,  et  al  Is  that  they  hold  only  In  the  limit  as 
the  number  of  points  tends  to  Infinity,  and  hence  the  results  do 
not  speak  about  moderate  (and  the  usually  Interesting)  values  of 
n.  Our  results  are  derived  In  a  framework  that  Is  not  plagued 
by  this  difficulty. 
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.  Distribution  of  Points 

Our  objective  is  to  derive  theoretical  bounds  on  the  expected 

lengths  of  tours  constructed  by  various  approximation  algorithms 

for  the  TSP  when  the  points  are  distributed  randomly  in 

two-dimensional  Euclidean  space.  In  this  paper,  we  shall  assume 

that  the  Cartesian  coordinates  (x,y)  of  each  point  are  generated 

2 

from  a  normal  distribution  with  mean  0  and  variance  a  ,  denoted 
N(0,o^).  This  distribution  obeys  the  statistical  assumptions 
made  in  previous  work.  It  was  selected  to  obtain  concrete 
quantitative  results,  and  because  It  was  quite  tractable  to 
analyze.  Although  we  deal  here  with  the  normal  distribution, 
the  analytic  techniques  themselves  are  applicable  to  any 

distribution  of  points  that  depends  only  on  the  origin  and 
decreases  monotonical ly  outside  a  circle  of  sufficiently  large 
radius . 

One  of  the  important  statistical  techniques  that  we  shall  use 
in  our  analysis  comes  from  the  distribution  of  order  statistics 
[5,6,8].  Let  Xj,...,;c^  be  a  random  sample  of  size  m  from 
some  probability  density  function  f(x).  We  can  find  the 

distribution  functions  of  the  order  statistics  yi»***»yin» 
where  the  y^'s  are  the  x^'s  arranged  In  order  of  magnitude 
so  that  yi<y2<* • •<ym*  joint  distribution  of  the 

y^'s,  other  Interesting  and  useful  distributions  ~  Including 
those  of  the  maximum,  the  minimum,  and  the  range  ~  may  be 
derived.  Specifically,  the  probability  function  of  the 

1-th  smallest  element  of  {yj}  is  given  by 

9^(y)  dy  -  [F{y)]^“^  [l-FCy)]"’^  f(y)  dy 


where  F  gives  the  cumulative  density  function  of  the  y^'s.  In 
order  to  apply  this  technique  to  the  TSP«  It  Is  necessary  that 
(1)  the  points  should  be  generated  Independently  from  the 
distribution,  and  (2)  the  distribution  of  the  length  of  the  edge 
connecting  any  two  random  points  should  be  known.  We  now  derive 
this  distribution. 

Lemma  1;  The  distribution  function  of  the  distance  between 
two  points  selected  at  random  from  N(0,a^)  Is 


-t"/4o 

F(t)  -  1  -  e 

and  the  expected  distance  between  two  points  Is  E(t)  -  /na. 

Proof ;  Let  (x^.y^)  and  (x2,y2)  he  the  coordinates 
of  two  randomly  selected  points.  Then,  the  distance  z  between 
them  Is 

2  -  Axj-xg)^  +  (yj-yg)^ 

Cramer  [5]  proves  that  If 


where  the  5^  are  generated  from  N(0,o^),  the  density  of  w  Is 
given  by 


g(w) 


2w"~^  e 
,n/2  n 

£  9 


/2o 


2 


r(n/2) 


where  r  denotes  the  gamma  function.  Since  (X|-X2)  and 
(yi-y2)  are  differences  of  normal  distributions,  they  are 
generated  by  N(0,2a^).  Substituting  n«2,  we  obtain 
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1  ,2/a  ? 

f(z)  -  ^7  z  e‘^ 


and  hence 


Prob(z<t)  >  J  f(z)  dz 


,t  1  -  J^/4.^ 

Jo  FT' '  * 


,t  -  4^4.^ 


1  -  e 


-  t^4a^ 


Furthermore,  the  expected  distance  between  two  random  points  1s 

E(t)  -  t  f(t)  dt 
0 


«  1  -  -  t^/4a^ 

J  7^  t^  e  dt  ■  /7a 

0 


q  » e .  d . 


1 


.  Nearest  Neighbor  Method 
The  nearest  neighbor  algorithm  for  the  TSP  may  be  briefly 
described  as  follows. 

One  of  the  nodes  is  arbitrarily  selected  as  the  starting 
point.  Among  all  the  nodes  not  yet  visited,  the  one  that  is 
closest  to  the  current  node  Is  selected  as  the  next  to  be 
visited.  After  all  the  nodes  have  been  visited,  return  to 
the  starting  point. 

Rosenkrantz,  et  al  [17]  have  shown  that  this  algorithm  always 
constructs  an  n-point  tour  those  length  Is  at  most  j  logjo  of  the 
optimal,  and  that  there  exist  graphs  for  which  Its  tour  is  \  log_n 
times  the  optimal.  We  now  derive  the  expected  path  length  when 
the  coordinates  of  the  points  are  selected  from  N(0,o2). 

Theorem  2;  The  expected  length  of  the  tour  for  n  points 
constructed  by  the  nearest  neighbor  method,  Is  bounded 

from  above  by 

^NN^''J  i.  ^ ®  /n-1  ♦  0(  /Tog  n ) 

Proof;  Suppose  we  start  at  an  arbitrary  node  A.  The 
expected  length  of  the  first  edge  in  the  tour  Is  the  minimum  of 
the  n-1  edges  Joining  A  to  all  other  points  (see  Figure  1). 


A 


Figure  1.  n-1  edges  emanate  from  an  arbitrary  point  A. 


The  lengths  of  these  edges  follow  the  distribution  F  given  by 
Lemma  1  and  are  independent.  By  order  statistics,  the 
distribution  of  g^,  the  length  of  the  shortest  of  these  n-1 
edges.  Is  given  by 

9l(t)  dt  -  (n-1)  [l-F(t)]"-2- f(t)  dt 
Thus,  the  expected  length  of  the  first  edge  Is 

^^•*1)  •  /q  t  9i(t)  dt 

•  fc*  t(n-l)  Cl-F(t)]'»-2  f(t)  dt 

-  JJj*  t(-d[l-F(t)]'‘-l) 

Integrating  by  parts,  we  get 

E(l-i)  -  tr  dt 

-(n-l)t2/4«2 

-  i)  « 

•  /vo/  /n-l 

Similarly,  the  expected  length  of  the  1-th  edge  added 
to  the  tour  Is  the  minimum  of  the  n-1  edges  from  the  current 
node  to  the  remaining  unvisited  points.  Hence,  by  the 
properties  of  order  statistics  and  Lemma  1, 

Ed-i)  -  [l-F(t)]""^  dt  -  /IJa/Zn^T 

The  closing  edge  of  the  tour  Joins  the  last  node  added  to  the 
starting  point.  Denoting  Its  length  by  L^g,  we  obtain  for  the 
total  tour  length 

T,,(n)  .  "f|  E(L,)  .  E(L„) 

n-1  'Hto 
"  /n^r  * 

n-1  /iTg 

■  til  ^ 
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^n-1  /irg 

S  Jo*  /r 

■  2 /Ta  1^-1  ■•■  E(L  ) 

V  C 

Observe  that  E(Lgg)  is  at  most  equal  to  the  expected  value 
of  the  longest  edge  Joining  the  starting  point  A  to  any  of  the 
other  n-1  points  (see  Figure  1).  By  order  statistics,  the 
distribution  gjj_j^  of  the  longest  of  n-1  edges  is  given  by 
9n-l(t)  dt  -  (n-1)  F(t)"-1  f(t)  dt 
and  Its  expected  value  by 

^  t  9n— 1^^^ 

Gumbel  [8,  Sect.  6.3.8]  shows  that  this  quantity  asymptotically 
becomes 

E(gj,_j)  .  2a  «1n(n-l)  +  Yo/’^ln(n-l ) 
where  y  Is  Euler's  constant,  t  ■  .577^.  q.e.d. 

To  find  the  expected  length  of  some  authors  [18]  have 

made  the  simplifying  assumption  that  all  points  except  the  one 
closest  to  the  starting  point  are  equally  likely  to  be  selected 
as  the  last  point  added  to  the  tour.  To  our  knowledge,  the 
validity  of  this  assumption  has  never  been  formally  proven,  and 
there  exists  experimental  evidence  to  the  contrary  [9].  If  the 
assumption  holds,  the  expected  length  of  the  closing  edge  can  be 
estimated  by  the  average  distance  from  the  starting  point  to  all 
points  other  than  Its  nearest  neighbor.  As  the  number  of  points 
Increases,  this  quantity  approaches  the  expected  distance 
between  any  two  points,  which  Is  by  Lemma  1.  This  distance, 
although  Independent  of  the  number  of  points  n,  does  not 
significantly  alter  the  length  of  tour  bound  we  have  derived. 
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3 .  Arbitrary  Insert  Method 

The  arbitrary  Insert  algorithm  for  the  TSP'  operates  as  follows. 

1.  Choose  any  node  A  as  the  starting  point,  and  another 
arbitrary  point  B  as  the  second  node  to  be  visited. 
Construct  the  tour  going  from  A  to  B  and  back. 

?.  Randomly  choose  one  of  the  nodes  P  not  yet  visited  as 
the  next  point  to  be  Inserted.  Find  the  node  Q  already 
In  the  tour  which  Is  closest  to  P.  From  the  two  nodes 
adjacent  to  Q  In  the  tour,  select  the  one  R  such  that 

dpq  +  dpp  -  dQR 

Is  minimal,  where  d^j  denotes  the  distance  between 
points  1  and  j.  Add  edges  and  THr  to,  and  delete  IJIT 

from,  the  tour.  Repeat  this  step  until  all  points  are 
Included. 

Rosenkrantz,  et  al  [17]  have  shown  that  this  algorithm  always 
produces  a  tour  whose  length  Is  within  a  factor  of  log^n  of 
the  optimal,  but  It  Is  an  open  question  whether  this  logarithmic 
growth  can  actually  be  realized.  Using  a  complicated  proof, 
Morozinskll  [1.4]  has  shown  that  this  algorithm  constructs  a  tour 
whose  expected  length  Is  0(>^)  and  Is  within  a  factor  of  4  frjom 
the  optimal  for  a  general  class  of  probability  distributions 
which  Includes  the  normal. 

Theorem  3a;  The  expected  length  of  the  tour  for  n  points 
constructed  by  the  arbitrary  Insert  method,  T^j(n),  Is  bounded 
from  above  by 

TA|(n)  <  4*?  a  ✓'n-1 

Proof;  Suppose  we  have  1  points  In  the  tour,  where  1  >  ?• 
The  (l'^l)-st  point  P  Is  chosen  at  random  from  the  remaining  set 
of  n-1  points  (see  Figure  2). 
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n-1  points 
1  edges 

1  points  In  the  tour 


Figure  2.  P  is  the  1-th  point  to  be  Inserted  In  the  tour. 


The  expected  value  of  the  mlnlmuni  distance  0^  from  P  to  the  1 
points  In  the  tour  Is 

E{D^)  -  J(j“  [l-F(t)]^  dt  -  /7  olA 

We  must  now  compute  the  cost  of  adding  point  P  to  the  tour. 
Consider  the  situation  Illustrated  In  Figure  3. 
aP 


/  \ 

!  By  the  Triangle  Inequality, 

f  \  . 

^  ®PQ  +  dQR  >  dpp 


Figure  3.  To  insert  P,  delete  edge 
^  and  add  edges  and 

Hence,  the  expected  cost  of  inserting  P  Is  at  most  twice 

and  the  total  tour  length  can  be  bounded  from  above  by 
n-1 

T„(n)  <  2E(IJ,)  *  2E(L„) 


I 

i 


i 


1 
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where  E(L^g)  .  the  expected  length  of  the  random 

starting  edge  in  step  1  of  the  algorithm  (see  Lemma  !)• 
Therefore, 


n-1 

T„(„)  <  2  ,^2  /r 


2 


n-1  /To 


^  4  /; 


q>e.d. 


The  bound  of  Theorem  3a  Is  quite  conservative,  since  It  uses 
the  Triangle  Inequality  as  the  basis  for  estimating  the  cost  of 
inserting  the  new  point  P<  The  Triangle  Inequality  actually 
describes  the  worst  case  cost  of  inserting  P.  Let  us  examine 
each  of  the  three  lengths  involved  in  the  computation  of  this 
cost,  dpq  +  dpp  -  dqp,  more  closely. 

By  applying  order  statistics,  we  determined  the  expected 
value  of  dpq  to  be  E(0j),  the  expected  minimum  distance  from 
P  to  the  1  points  already  in  the  tour.  Since  point  Q  appears  at 
some  random  spot  in  the  i-point  tour  being  modified,  we  would 
expect  dqp  to  be  an  average  length  edge  in  this  partial  tour. 
Thus,  if  we  let  E(L^)  denote  the  expected  length  of  the 
i-point  tour  during  the  construction,  then  the  expected  value  of 
V  is  E(Li)/i. 

Finally,  we  consider  dp^.  By  the  operation  of  the 
algorithm,  P  is  known  to  be  closer  to  Q  than  R  and  so  dpp  > 
<lpQ.  By  the  Triangle  Inequality,  dpp  <  dpq  +  dqp.  Just 
where  dpp  falls  in  this  range  is  unknown,  but  a  reasonable 
assumption  might  be  that  the  distance  dpp  is  distributed  uniformly 
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between  the  two  limits.  This  would  Imply  that  the  expected  value  of 
dpi^  Is  dpQ  +  j  dgp,  and  that  the  expected  cost  of  Inserting  P  Is 
,  r  E(L.n  E(LJ  Ed.) 

E(''po*'^PR-'‘qR)  •  ^  -  ?e(d,)- 

Since  we  have  no  formal  basis  for  the  validity  of  this 
assumption,  we  will  refer  to  it  as  the  "reasonable  insertion 
hypothesis". 

Theorem  3b;  Under  the  reasonable  insertion  hypothesis, 

^Al(n)  Is  bounded  from  above  by 

?  fiT  a  fn-1  +  0(1) 

Proof;  From  the  above  discussion,  the  expected  length 
E(l-^+j)  of  the  (1+l)-po1nt  tour  Is  equal  to  the  expected 
length  of  the  1-po1nt  tour  plus  the  expected  cost  of  Inserting 
the  (1+l)-st  point  P.  A  recurrence  relation  describing  this 


fact  Is 


E(Li+j^)  -  Ed^)  ♦  E(dpQ+  dpp-  dgn) 

■  Ed^)  ♦  2E(D^) 


We  are  Interested  In  solving  this  recurrence  for 
T^j(n)-E(L^) .  This  relation  can  be  solved  using  the  method 
of  summing  factors,  described  In  Lueker  [1?].  To  do  so,  we  need 
an  appropriate  boundary  condition  which,  from  Step  1  of  the 
algorithm.  Is  ECL^).?/?  a. 

A  general  recurrence  relation  of  the  form 

*1+1  -  ^1*1  +  91 
has  solution 
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n-2  n-1  n-1 

-  I  (  n  f.)g.  ♦  g  ,  +  (  n  f<)x 

"  i-a  j-i+1  J  ^  l.a  ’  * 

where  denotes  the  value  of  x  at  the  boundary  condition. 
Hence,  the  solution  to  our  recurrence  is 

EC-  )  -  "f  (  i|^)2E(D^)  +  2E(D  2^)E(L2) 

"  i-2  j-i+1  ^  ^  i-2  ^ 


n-2  n-1 

j 

Applying  the  inequality 


2  Y  ( 

i-2  j-i+1  /T  /rTT  i-2 


tn 

n 

i-k 


2i-l  -/Z 


!1  ^  '  m 


we  obtain 


E(L  1  i  ^  "f /Itl  +  12.*^) 

”  /ITT  i«  ’  /iTT 


The  desired  result  follows  from  the  following  estimate  of  the 
sum. 

i-2  ’  1  * 

n-2 


-  (n-1)  +  2/n-^  -  4 


q.e.d. 
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l*8arest  and  Cheapest  Insert  Methods 

The  nearest  insert  algorithm  operates  similiarly  to  the 
arbitrary  insert  method  except  that  the  next  point  P  to  be 
inserted  into  the  tour,  instead  of  being  chosen  arbitrarily,  is 
selected  to  be  the  point  not  yet  in  the  tour  which  is  closest  to 
any  node  already  in  the  tour.  (The  second  point  selected  is  the 
starting  point's  nearest  neighbor.)  Rosenkrantz ,  et  al  [17] 
have  shown  that  this  method  always  produces  a  tour  whose  length 
is  at  most  twice  the  optimal,  and  that  there  exist  graphs  for 
which  it  performs  virtually  this  badly. 

Theorem  4a;  The  expected  length  of  the  tour  for  n  points 
constructed  by  the  nearest  insert  method,  T|^j(n),  is  bounded 
from  above  by 

<  4(2  -  »^)  a 

Proof:  Consider  the  case  when  we  have  i  points  in  the 
tour.  To  get  the  (in)-st  point,  we  take  the  minimum  of  n-i 
edges  connecting  each  of  the  i  nodes  in  the  tour  to  all  of  the 
remaining  n-i  nodes  (see  Figure  4). 


remaining  n-i  points 


1  sets  of  n-i  edges 


i  points  in  the  tour 


Figure  4.  i  sets  of  n-1  edges  Join  the  i  points 
in  the  tour  with  the  n-1  points  not  yet  Injierted. 
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Observe  that  the  expected  ininltRum  In  1  sets  of  n-1  edges  Is 
less  than  or  equal  to  the  expected  minimum  of  the  n-1  edges 
emanating  from  one  of  the  1  points  already  In  the  tour.  Hence, 
by  Lemma  1  and  order  statistics,  the  expected  distance  from  the 
point  P  to  be  Inserted  to  any  of  the  1  points  already  In  the 
tour  Is  upper  bounded  by  /7a !  /n-1 

E(m1n.  of  1  sets  of  n-1  edges)  £  E(m1n.  of  n-1  random  edges) 

•  /To  /  y^i-T 


Let  L*  denote  the  length  of  the  tour  constructed  through  the 
f  1rst[”^'jpo1nts,  and  L"  be  the  addition  to  the  length  by  the  re¬ 
maining!^^  [points.  Then, 


TNi(n)  -  E(L‘)  +  E{L“) 

Given  the  expected  length  of  the  1-th  edge  added  to  the  tour, 
our  analysis  proceeds  as  In  the  proof  of  Theorem  3a.  We  find 


-1 


f?l-' 


E(L')  <  2(0^)  <  2/ra 


1-1 


1.1 


n-1 


2  /To 


1. 


n-1 

I 

f?l 


_L 

A'‘ 


Now  let  the  tour  contain  more  than  j^|po1nts.  Then  finding 
the  minimum  of  1  sets  of  n-1  edges  drawn  f'^m  the  nodes  in  the 
tour  to  the  remaining  nodes  Is  equivalent  to  finding  the  minimum 
of  n-i  sets  of  1  edges  drawn  from  the  nodes  not  in  the  tour  to 
the  nodes  already  In  the  tour  (see  Figure  4).  As  before, 

E(m1n.  of  n-1  sets  of  1  edges)  £  E(m1n.  of  1  random  edges) 

-  /tal  /T 


and 


E(L-)  < 


n-1 


^  2E(D,)  <  2/ra 

r--.*!  1  “ 
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Hence 


T„j(n)  -  E(L')  +  E(L“) 


n-1 

<  ?/7o  I 

1-  i 


r  pri  /r 
2 


i.  4/wa 


n-1  ,  r 

y  p  <  4/«o  2  /fi-i  - 

dm  ^  L 


“  4(  ?-*'T)  /iro  /n-1 


I  •  e  •  d  • 


Again,  the  bound  of  Theorem  3a  Is  very  conservative  since  it 
is  based  upon  using  the  Triangle  Inequality  for  bounding  from 
above  the  cost  of  inserting  each  new  point.  We  now  wish  to 
investigate  the  improvement  in  tour  length  under  the  reasonable 
insertion  hypothesis  introduced  in  the  previous  section. 

Theorem  4b;  Under  the  reasonable  insertion  hypothesis, 
T|^l(n)  is  bounded  from  above  by 

T^j(n)  <  f  /r  ff  ^  *  0(1) 

Proof;  As  in  the  proof  of  Theorem  3b,  the  reasonable 
insertion  hypothesis  gives  rise  to  the  recurrence  relation 

E(L^^l)  -  E(L^)  +  2E(0.) 

This  time,  the  recurrence  must  be  solved  twice. 

The  expected  length  of  the  tour  through  the  first  points  is 


E(Lo) 


E(L')  M  ^ 

1-2  \j-i+l  /  I7r^  y  i-2 

where  E(O^)  and  the  boundary  condition  Ed^)  are  given  by 
E(0<)  <  and  E(L«)  ■  ^ 

*  **  J  ^  i  ^  Ai_l 


r 


Using  the  Inequality 


1-k  ^ 


We  find  that 


r-i.  .  2'f2  ffa  If'  ^  /l+l  fJo  4/Ta 

'"■’-■7^  I  ’'5!^  *  ■^75i~  * /TTCTTTTrrr 

1-2 

The  first  of  the  three  terms  will  dominate,  and  a  good  estimate 
of  this  term  may  be  obtained  as  follows. 


i/^  ^  I,  M 

1-?  i 

-  ("*u 

<.  (n*l)  ^  “  0  -  1**^  *  (*>*1) 

since  i  and  tan  both  approach/^  rapidly.  Hence, 

E(L' )  <  +  0(-i) 

~  ^  /rTI?  /n 

For  the  second  half  of  the  tour,  we  must  again  solve  the 


recurrence. 


T,,(n).6(L„)."|,^2^)n(0,)  .  a(0„.i)  .  lECij) 

where  E(L»J  -  E(L'),  which  we  just  estimated,  serves  as  the 


I*  * 

boundary  condition.  Applying  our  usual  Inequality  f or n  (2j-l )/2 j 
and  remembering  that  £(0^)<  In  the  second  half  of  the 
tour,  we  obtain 


6-19 


E(L„)<  ^  ^  ^ 

"  ' /fnr  i-lr]  ~  ^ 

Next,  we  bound  the  summation  in  the  first  term  from  above  by  an 
i ntegra 1 . 


■Q-ii- 


-  j(n-l)  +  {2-/Z')/nr2- 

The  theorem  immediately  follows  by  adding  together  the  terms 


growing  as  in  the  sum  for  E(Lj^). 


i.e.d. 


The  cheapest  insert  algorithm  operates  somewhat  like  the 
nearest  insert  method.  Again,  the  n«x.t  point  P  to  be  inserted  is 
chosen  to  be  the  node  not  yet  in  the  tour  which  is  closest  to  any 
node  already  in  the  tour.  Point  P  is  inserted  by  finding  the 
edge  {JIT  already  in  the  tour  such  that  ‘lpq+  dp^-dq^  is  minimized, 
and  deleting  this  edge  while  adding  "PIJ  and  IHT.  Hence,  this 
algorithm  inserts  P  at  the  least  costly  place.  Rosenkrantz, 
et  al  [17]  have  shown  that  the  worst  case  behavior  of  this 
method  is  the  same  as  that  of  the  nearest  insert  algorithm. 
Since  the  tour  constructed  by  the  cheapest  insert  method  cannot 
be  longer  than  that  constructed  by  the  nearest  insert  method, 
the  upper  bounds  of  Theorems  4a  and  4b  also  apply  to  cheapest 
insert . 
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5  ►  WST  -  Bastd  >tethod 

The  minimal  spanning  tree  of  a  set  of  n  points  consists  of 

the  n-1  edges  connecting  all  the  points  In  such  a  way  that  the 

•  « 

total  length  of  the  edges  Is  minimized.  Lewis  and  Papadimitrl ou 
[13]  and  others  have  shown  how  to  convert  the  MST  Into  a  TSP 
tour  whose  length  Is  at  most  twice  that  of  the  optimal  tour. 
Chrlstofides  [3]  has  further  refined  this  method  to  produce  a  TSP 
tour  whose  length  Is  at  most  times  the  optimum, to  be  described 
In  the  next  section. 

We  now  proceed  to  explore  the  relationship  between  the  length 
of  the  optimal  TSP  tour,  denoted  |0PT|,  and  the  length  of  the 
MST,  denoted  | MST|  .  Since  the  optimal  TSP  tour  can  be  converted 
Into  a  spanning  tree  (not  necessarily  the  minimal  one)  by 
removing  one  edge,  we  have 

I  MST|  <  |0PT| 

Furthermore, 

IoPTI  <2  I MST I 

and  this  occurs  when  all  the  points  are  colllnear.  The  validity 
of  this  latter  claim  will  be  clarified  In  what  follows. 

The  tour  building  technique  described  In  Lewis  and 
Papadimitrlou  Is  based  on  the  observation  that  the  MST  can  be 
converted  Into  a  tour  visiting  all  the  points  by  traversing  each 
edge  twice  and  returning  to  the  origin,  as  Illustrated  In  Figure 
5a.  This  twice-around-the-tree  tour  Is  then  converted  Into  a 
legitimate  TSP  tour  by  shortcutting  any  previously  visited 
points  and  proceeding  directly  to  the  next  unvisited  point,  as 
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shown  In  Figure  5b.  It  Is  easy  to  see  that  the  length  of  the 
TSP  tour  produced  is.  bounded  above  by  twice  the  length  of  the 
MST. 


a)  MST  and  tour  with  length  «  2|MST1.  b)  TSP  tour  based  on  the  MST. 

Figure  5. 


We  now  proceed  to  bound  the  expected  length  of  the  MST  from 

above  using  Prim's  algorithm  [16].  This  method  for  constructing 

the  exact  MST  may  be  briefly  described  as  follows. 

Arbitrarily  choose  any  node  as  the  starting  point,  and 
Include  It  In  the  tree.  From  among  a11  the  nodes  not 
yet  In  the  tree,  select  the  one  that  Is  closest  to  any 
tree  node,  and  add  this  node  and  the  corresponding  edge 
to  the  tree.  Continue  this  procedure  until  all  nodes 
are  Included. 

Theorem  5:  The  expected  length  of  the  MST  for  n  points, 
Lmst^'*)*  Is  bounded  from  above  by 

L^^j(n)  ^  /»  ff  ii^n— 1 . 

Proof;  The  situation  when  the  1-th  edge  Is  added  Is 
Identical  to  that  for  the  nearest  Insert  algorithm.  To  get  the 
length  of  the  1-th  edge  added,  we  take  the  minimum  of  the 
n-1  edges  connecting  each  of  the  1  nodes  already  In  the  tree  to 
the  remaining  n-1  points  (see  Figure  6).  As  In  the  proof  of 
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remaining  n-1  points 


1  sets  of' n-1  edges 
or 

n-1  sets  of  1  edges 


1  points  In  MST 


Figure  6.  l(n-l)  edges  Join  the  1  points  In  the 
MST  with  the  n-1  points  still  to  be  added. 


Theorem  4a,  for  1  i  t  we  have 

E(L^)  ■  E(ffl1n.  of  1  sets  of  n-1  edges) 

£  E(m1n.  of  n-1  random  edges) 

/t  a 

We  may  obtain  a  better  bound  for  the  remaining  nodes  added  by 
considering  the  directions  of  the  edges  to  be  reversed.  Again, 
as  In  Theorem  4a, 

E(L^)  -  E(m1n.  of  n-1  sets  of  1  edges) 

<  E{min.  of  1  random  edges) 

.  ^ 

/T 

for  1^  *  1  <  ^  Hence, 


He  note  that  the  expected  length  of  the  MST  can  be  bounded 
from  below  by 

_>  /w  o  /n-l 

using  the  technique  to  be  described  In  the  proof  of  Theorem  7a. 
This  result  follows  from  the  observation  that  the  MST  contains 
n-l  edges,  each  of  whose  expected  length  Is  at  least  as  great  as 
the  expected  distance  from  a  point  to  its  hearest  neighbor. 
Hence,  the  bound  of  Theorem  5  is  quite  tight  since  2(2-/7)!s 
1.17. 

The  MST-based  algorithm  described  above  produces  a  TSP  tour 
whose  expected  length,  ^ bounded  from  above  by 

^MSTB^”^  —  ^'■MST^”^  —  4(2-vT)f/ir  ff  /n-l 
In  fact,  we  should  expect  the  length  of  the  TSP  tour  constructed 

to  be  significantly  less  due  to  the  shortcutting  procedure. 
Unfortunately,  the  geometric  and  statistical  techniques 
necessary  to  obtain  a  good  estimate  of  this  improvement  have  not 
yet  been  Identified. 
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6.  Chr Istof Ides  *  Method 


The  MST>based  algorithm  described  In  the  previous  section 
can  be  regarded  as  consisting  of  four  basic  steps. 

1.  Construct  the  minimal  spanning  tree. 

?.  Convert  the  MST  Into  a  multigraph  consisting  of  two 

edges  for  each  edge  In  the  MST  (e.g.,  the  dotted  edges 
In  Figure  5a). 

3.  Construct  an  Eulerlan  tour  of  the  multigraph  produced 
in  Step  ?.  An  Eulerlan  tour  traverses  each  of  the 
edges  In  a  graph  exactly  once,  returning  to  the  origin. 

4.  Convert  the  Eulerlan  tour  of  Step  3  Into  a  legitimate 
tSP  tour  by  shortcutting  edges  to  previously  visited 
points . 

It  Is  well-known  that  a  connected,  multigraph  contains  an 
Eulerlan  tour  If  and  only  If  the  degree  of  each  of  Its  vertices 
Is  even.  Such  a  graph  is  called  an  Eulerlan  multigraph. 
Clearly,  the  procedure  of  Step  2  ensures  that  this  condition 
will  be  met. 

Christofides  [3]  has  discovered  another  way  of  converting 
the  original  MST  into  a  TSP  tour  yielding  an  even  better 
performance  guarantee  on  the  length  of  the  path  constructed. 
His  method  Is  the  same  as  above  except  that  Step  2  Is  changed  to 
the  following. 

2*.  Construct  the  Eulerlan  multigraph  consisting  of  all  the 
MST  edges  plus  the  edges  In  the  minimal  weight  matching 
on  the  vertices.,  of  odd  degree  In  the  MST. 

A  matching  on  a  set  of  2m  vertices  V  Is  a  partition  of  V  Into  m 

disjoint  2-element  sets.  Associated  with  the  matching  1$  the 

set  of  edges  FQ*  for  each  2-e1ement  set  {P,Q}.  The  minimal  weight 

matching  on  V  Is  the  one  In  which  the  total  sum  of  Its 

associated  edge  lengths  Is  smallest. 
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The  operation  of  Step  2'  is  illustrated  in  Figure  7.  Since 
the  matching  adds  one  new  edge  incident  with  each  vertex  of  odd 
degree  in  the  MST ,  all  of  the  vertices  in  the  multigraph  are  of 
even  degree  and  the  existence  of  an  Eulerian  tour  is  guaranteed. 
Furthermore,  the  perfect  matching  must  exist  since  the  number  of 
vertices  of  odd  degree  in  the  MST  is  even,  according  to  another 
well-known  result  from  graph  theory. 


a)  MST,  odd  degree  vertices  circled.  b)  Minimal  odd  vertex  matching  added. 

Figure  7. 


We  now  explore  the  relationship  between  the  lengths  of 
Christof ides '  tour  (denoted  |CM|),  the  minimal  odd  matching 
(denoted  |  MOM  |  ) ,  the  minimal  spanning  tree  (  |  MST|  ),  and  the 
optimal  TSP  tour  (  IoptI  ).  Since  the  length  of  the  Euclidean 
tour  constructed  in  Step  3  equals  |MST|  *  |M0M|,  we  have 

Icm|  <  ImstI  ♦  |mom| 

We  observed  in  Section  5  that  ImStI  <  |0PT|.  It  also  turns  out 
that  ImomI  <  Y  Iopt).  This  occurs  because  the  optimal  TSP  can  be 
converted  into  a  tour  T  through  the  vertices  of  odd  degree  in  the 
MST  by  shortcutting  any  edges  passing  through  the  even  vertices. 
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Clearly,  1T|  <  |OPT|.  Furthermore.  T  contains  two  matchings  on 
the  odd  vertices,  formed  by  taking  every  other  edge,  and  the 
length  of  the  shorter  of  these  cannot  exceed  I-IoptI.  (See 
Figure  8.)  We  conclude  that 

|CM|  <  ^  |0PT| 

Cornuejols  and  Nemhauser  [4]  have  further  shown  that  this  bound 
Is  tight  by  exhibiting  Instances  of  the  problem  for  which  the 
algorithm  performs  this  badly. 


Figure  8.  Optimal  TSP  tour  with  odd  degree  vertices  In  MST  circled. 

Shortcut  tour  through  these  vertices  contains  two  matchings. 

The  expected  length  of  the  tour  produced  by  Chrlstof Ides ' 
algorithm  can  be  bounded  above  by  the  sum  of  the  expected 
lengths  of  the  MST  and  MOM.  Since  we  already  considered  the 
length  of  the  MST  In  Section  5,  we  turn  our  attention  to  the 
problem  of  determining  the  expected  length  of  the  matching.  The 
number  of  points  participating  in  the  matching  varies  from  one 
problem  Instance  to  another.  All  n  points  participate  In  the 
worst  case,  although  we  would  anticipate  this  situation  to  arise 


6-27 


only  rarely.  Unfortunately,  we  do  not  know  of  any  techniques 
for  determining  the  expected  number  of  points  in  the  matching, 
and  this  remains  an  interesting  open  question. 

Another  difficulty  arises  in  estimating  the  expected  length 
of  the  minimal  matching.  All  of  the  algorithms  studied  so  far 
adhere  to  the  "greedy"  design  paradigm.  That  is,  they  make  a 
series  of  decisions  on  how  to  proceed  based  on  finding  the 
smallest  edge  with  a  certain  property.  Order  statistics  lends 
itself  nicely  to  examining  the  expected  behavior  of  such 
procedures.  However,  we  know  of  no  such  greedy  algorithm  for 
the  optimal  matching.  Instead,  we  shall  bound  its  expected 
length  from  above  by  analyzing  a  greedy  matching  heuristic  which 
does  not,  in  general,  produce  the  best  match.  This  method 
operates  as  follows. 

Randomly  select  a  point  and  pair  it  with  its  nearest 
neighbor.  From  the  remaining  n-2  points,  randomly  select 
one  and  pair  it  with  the  nearest  unmatched  point.  Repeat 
the  procedure  n/2  times,  when  all  points  will  be  paired. 

Theorem  6;  The  expected  length  of  the  matching  for  n  points 

constructed  by  the  greedy  matching  heuristic,  is 

bounded  from  above  by 

LGn|(n)  ^  o/'n— 1 

Proof;  At  the  i-th  iteration  of  the  algorithm,  the  point  P 
selected  randomly  is  paired  with  one  of  the  n-2i+l  remaining 
points.  From  order  statistics  and  Lemma  1,  the  shortest  of  the 


n-^i-**!  edges  Joining  P  to  the  unpaired  points  has  expected  length 


E(L^) 


/7  g 

/n^m 
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Kence,  th«  expected  length  of  the  entire  matching  is  given  by 


n/2 


n/2 


-  i,  Ed,)  -  J  -  /T 

1-1  itl  v/n-2i+l 

n/2 


-1 


I 

iil  Jn-2\n 


*  1 


i  /» 


/ 


dx 


1  /n-?i+l 


+  1 


-/To  /n-i  q  .e.d. 

An  obvious  improvement  on  the  greedy  method  is  to  pick  the 
shortest  edge  among  any  of  the  remaining  points  at  each  step. 
Because  of  statistical  dependencies  between  the  edges,  we  cannot 
say  anything  significant  about  this  technique.  However,  we  can 
content  ourselves  with  the  following  interesting  fact.  Although 
the  length  of  the  matching  produced  by  our  simple  greedy 
heuristic  can  be  quite  bad,  its  expected  value  is  within  a 
factor  of  two  of  the  expected  length  of  the  minimal  matching. 
To  see  this,  observe  that  the  expected  length  of  the  minimal 
matching  on  n  points,  l.|y||y|(n),  can  be  bounded  below  by 

>  J  i  j  o/n 
/n-l 

since  the  matching  contains  j  edges,  each  of  whose  expected  length 
is  as  great  as  the  expected  distance  from  a  point  to  its  nearest 
neighbor.  A  general  discussion  of  such  lower  bounding 

techniques  for  the  TSP  follows  in  Section  7. 

Suppose  cn  points  participate  in  the  matching,  where  the 
fraction  c  is  such  that  0  £  c  <  1.  Then,  as  a  corollary  to 
Theorem  6,  the  expected  length  of  the  TSP  tour  constructed  by 
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Chr istof  Ides '  algorithm,  Tj,|^(n),  can  be  bounded  above  by 
TcM(n)  <  1-mst^'’)  *  '-QM^cn) 

£  2  ( 2 - 1^)  /iT  a  /n-1  +  /T  ✓»  a/ n-^ 

In  the  worst  case  c-1 ,  yielding  a  bound  of 
Tcm(!i)  <  (5-21^)*^ 

Under  the  reasonable  assumption  that  half  of  the  points  are  of 
odd  degree  in  the  MST,  c*y  and 

Tj-^Cn)  £  (4-|-vf”)  /»  a  / n-1 

As  in  Section  5,  a  better  estimate  of  the  savings  resulting  from 
the  shortcutting  procedure  would  enable  us  to  sharpen  these 
bounds . 


7 .  Lower  Bound  on  Optimal  Tour  Length 
Theorem  7a;  The  expected  length  of  the  optimal  tour  through 
n  points,  TQpj(n),  is  bounded  below  by 
Tqpt ( n )  ^  /»  a  /n 

Proof;  Consider  an  arbitrary  node  in  the  graph.  The 
expected  distance  to  its  nearest  neighbor  is  given  by  the 
expected'  length  of  the  minimum  of  the  n-1  edges  connecting  the 
node  to  all  the  other  nodes  in  the  graph.  Using  order 
statistics,  we  have  already  seen 

/«  a 

E(distance  to  nearest  neighbor)  ■  ■ 

/  n-1 

Since  the  expected  length  of  each  of  the  n  edges  in  the  optimal 
tour  is  at  least  the  expected  distance  from  a  vertex  to  its 
nearest  neighbor,  we  have 


f  QP  j(  i 


/;  g 

/rTT 


o  /n 


q .e.d. 


A  better  lower  bound  can  be  derived  by  noting  that  exactly 
two  edges  are  incident  with  each  point  in  any  tour.  In  the  proof 
of  Theorem  7a,  we  observed  only  that  the  expected  length  of  the 
shorter  of  these  edges  is  at  least  as  great  as  the  expected 
distance  from  a  point  to  its  nearest  neighbor.  But  the  longer 
of  the  two  edges  emanating  from  a  point  has  expected  length  at 
least  equal  to  the  expected  distance  from  a  point  to  its  second 
nearest  neighbor.  Using  this  observation,  we  now  derive  a  better 
lower  bound. 

Theorem  7b;  The  expected  length  of  the  optimal  tour  through 
n  points,  fQpj(n),  is  bounded  below  by 

^OPT^”^  >  I  yT  a  vfT 
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Proof;  Let  and  O2  denote  the  distance  from  a  point  P 
to  Its  nearest  and  second  nearest  neighbors,  respectively. 
Attributing  half  of  the  length  of  any  tour  edge  to  each  of  Its 
endpoints,  we  have 

TQpy(n)  >  n  Cj  E(O^)  +  j  £(02^ 

As  In  Theorem  7a,  E(0|^)  ■/To/  /n-1 .  The  distribution  g2  of 
the  second  shortest  of  the  n-l  edges  Incident  with  a  point  P  Is 
given  by  order  statistics  to  be 

g2{t)dt  -  (n-l)(n-2)F(t)[l-F(t)]'’“^  f(t)dt 
and  Its  expected  value  Is 

E(D2)  -  /o*t  g2(t)dt 


-(n-2)t^/4a^ 


dt  «  /-t 


•^2g-(n-l)t‘/4tf 


Hence, 

TopT(n)  >  \  n  CE(Oi)  +  £(02)] 

2  y  /»  /H 


q .e .d . 
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This  states  that  the  optimal  tour,  no  matter  how  it  is  con- 

C 

structed,  will  have  an  expected  length  of  at  least  j  /T  This 
is  significant  in  that  all  of  the  algorithms  discussed  above 
produce  a  tour  whose  length  is  within  a  small  constant  factor  of 
this  lower  bound.  This  factor  ranges  from  a  low  of  25.7)1  for  the 
nearest  insert  algorithm  under  the  reasonable  insertion 
hypothesis  to  a  high  of  87.5 ){  for  the  MST-based  method.  As 
mentioned  in  Section  4,  the  cheapest  insert  method  performs  at 
least  as  well  as  nearest  insert. 

Using  different  techniques,  Morozenskii  [14]  has  shown  that 
the  asymptotic  expected  length  of  an  optimal  tour  is  proportional 
to  for  any  probability  density  which  depends  solely  on  the 
distance  from  the  origin  and  is  monotonic  outside  some  circle  of 
sufficiently  large  radius,  and  not  merely  for  a  normal 
distribution.  Morozenski i ' s  derivation  is  based  upon  the 
expected  distance  from  a  point  to  its  nearest  neighbor,  rather 
than  both  this  distance  and  that  of  a  point's  second  nearest 
neighbor.  A  related  result  by  Beardwood,  et  al  [1]  states  that 
the  length  of  the  shortest  closed  path  through  n  points  in  a 
bounded  plane  region  of  area  A  is  "almost  always"  proportional 
to  /nA  for  sufficiently  large  n. 
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t.  ^uuMry  and  Conclusions 

The  famous  traveling  salesman  problem  of  operations  research 
Is  NP-complete.  even  when  the  points  are  restricted  to  the 
Euclidean  plane  [7,15].  Because  of  this  apparent  computational 
Intractability,  one  must  resort  to  the  use  of  approximation 
algorithms  which.  In  general,  produce  suboptimal  tours.  Previous 
research  has  focused  on  the  worst  case  behavior  of  sach 
approximations  [17,3,4]*  Such  results  tend  to  be  overly 
pessimistic  since  worst  case  data  seldom.  If  ever.  Is  encountered 
In  practice.  Furthermore,  one  may  still  expect  most  reasonable 
approximation  methods  to  perform  about  equally  well  on  random. 
Inputs,  even  though  the  worst  case  performances  of  the 

algorithms  may  vary  greatly.  Experience  In  working  with  several 
approximations  tends  to  confirm  this  hypothesis  [9].  The  primary 
motivation  for  this  work  Is  to  provide  a  theoretical  basis  for 
explaining  this  Intuition  and  experience. 

In  this  paper,  we  applied  the  methods  of  order  statistics  to 
estimate  the  expected  lengths  of  the  tours  produced  by  several 
approximation  schemes  for  the  Euclidean  TSP.  To  do  so,  we 
selected  one  specific  distribution  of  points  for  extensive  study. 

A  primary  reason  for  choosing  the  two-dimensional  normal 
distribution  was  that  It  proved  to  be  mathematically  tractable. 
Furthermore,  this  distribution  conforms  to  all  of  the  statistical 
assumptions  made  In  prior  Investigations,  and  the  O(i^)  tours 
produced  are  also  In  line  with  previous  work  [1.14].  Hopefully, 
the  distribution  Is  typical  of  this  class  so  that  one  might 
expect  somewhat  similar  results  to  hold  had  a  different  choice 
been  made. 
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Our  principal  conclusion  Is  that  for  the  distribution 
chosen,  all  of  the  approximation  algorithms  studied  produce  a 
tour  whose  expected  length  Is  within  a  small  constant  factor  of 
optimal.  One  line  of  possible  future  research  would  be  to 
Investigate  the  variance  In  path  length  associated  with  the 
algorithms,  again  using  order  statistics.  A  low  variance  would 
tend  to  enforce  our  belief  In  the  algorithm's  ability  to  produce 
generally  good  tours,  whereas  a  high  variance  would  make  us  more 
skeptical  of  the  method.  Another  possible  line  of  Investigation 
would  be  to  extend  the  results  to  other  specific  distributions 
or,  better  yet,  to  general  classes  of  distributions  obeying 
certain  statistical  assumptions. 

Perhaps  the  most  Important  contribution  of  this  work  is  to 
show  how  order  statistics  can  be  applied  to  say  significant 
things  about  the  expected  behavior  of  heuristics  for  the 
Euclidean  TSP.  There  Is  no  reason  why  these  techniques  could 
not  be  applied  to  other  computational  problems,  as  well.  One  way 
of  coping  with  the  apparent  Intractability  of  NP-complete 
problems  Is  to  devise  fast  procedures  which  approximate  the 
optimal  solution.  To  date,  most  research  has  focused  on 
deriving  worst  case  performance  guarantees  for  these  methods, 
while  very  little  Is  known  about  their  expected  performance. 
Since  many  of  these  approximations  can  be  characterized  as 
“greedy",  algorithms  (I.e.,  they  minimize  or  maximize  some 
criterion  at  each  step),  they  would  be  good  candidates  for  the 
application  of  order  statistics  provided  It  Is  possible  to 
characterize  reasonably  the  distribution  of  Inputs.  Further 
explorations  of  this  type  could  be  most  useful  and  Interesting. 
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ALGORITHMIC  COMPLEXITY 
Part  7 

by 

Leonard  J.  Bass 

DATA  BASE  ACCESS  METHODS 
ABSTRACT 

A  survey  is  made  of  several  different  access  methods  for 
both  univariate  and  multivariate  range  queries.  These 
techniques  include  B-tree  and  extendible  hashing  as  univariate 
techniques  and  radix  bit  mapping  and  K-D-B  trees  as 
multivariate  techniques. 

All  techniques  discussed  are  currently  suitable  for 
practical  use. 
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DATA  BASE  ACCESS  METHODS 


As 

the 

requirements  for 

accessing  large 

dat&  bases 

have 

grown , 

the 

techniques  used 

in 

managing  these  accesses 

have 

become 

more 

sophisticated. 

I  n 

this  paper. 

several  of 

these 

techniques  are  surveyed.  First  we  review  K-ary  and  radix  trees 
which  are  utilized  by  the  access  methods  discussed.  Next  we 
discuss  two  different  univariate  access  techniques,  B-trees  and 
extendible  hashing.  Finally  we  present  two  multivariate  access 
methods;  radix  bit  mapping  and  K-D-B  trees.  All  of  the 
techniques  discussed  are  currently  suitable  for  practical  use. 

The  problem  we  are  discussing  is  the  accessing  of  a  data 
base  by  the  values  of  one  or  more  of  its  variables.  That  is,  a 
la'-ge  data  file  exists  which  contains  records  for  many 
variables  and  it  is  desired  to  retrieve  the  record(5)  with  the 
specified  values  of  certain  variables.  The  forms  of  this 
problem  depend  on  the  number  of  variables  used  to  define  the 
records  desired  and  whether  these  variables  are  defined 
specifically  (with  a  single  value)  or  by  a  range  of  values. 

More  formally,  if  each  record  of  the  data  base  has 
variables  Xj,  ,  ...,  X^  then  there  are  four  degrees  of 
generality  for  this  problem. 

1)  For  a  fixed  i  and  value  v,  locate  all  records  with 
Xj  ■  V  (univariate  match) 

?)  For  a  fixed  i  and  u  <  v  locate  all  records  with 
u  <  X^  1  V  (univariate  range) 
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3)  For  a  set  of  variables  r  £  s  and  a 

corresponding  set  of  values  Vj^,  ....  v^  locate  all 

records  with  (X, . X_^)  =  (v,,  ....  v^) 

1  r  1  r 

(multivariate  match  on  r  variables) 

4)  For  a  set  of  variables  X^,  X^  r  <  s  and  two 

corresponding  sets  of  values  u^  <  v^  1=1,  ....  r 
locate  all  records  with  u.  £  <  v.  1=1,  ...,  r 

(multivariate  range  on  r  variables) 

On  a  typical  computer  system  the  central  processor  operates 
about  1000  times  faster  than  the  associated  Input/output 
processor.  Since  a  data  base  resides  on  an  external  device 
(generally  a  disk  drive)  the  most  important  measure,  of  a  data 
base  accessing  algorithm  is  the  number  of  I/O  requests  that 
must  be  satisfied  to  execute  the  algorithm. 

Furthermore,  data  bases  change  over  time,  and  these  changes 
are  reflected  in  modification  of  data  Items.  A  modification  is 
a  deletion  followed  by  an  insertion  and  another  important 
measure  of  an  accessing  algorithm  is  how  well  it  adapts  to 
changes  in  the  underlying  data  base. 

Our  focus,  then,  will  be  on  the  amount  of  I/O  necessary  to 
access  and  modify  a  collection  of  records  in  a  data  base. 

Notation 

We  are  dealing  with  a  data  base  of  N  distinct  records,  each 
with  its  own  physical  record  address.  Within  each  record  we 
have  s  special  variables  which  are  to  be  used  to  access  the 
data  base. 
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For  each  query 

,  we  are  given 

r 

£  s  pairs  of 

values 

.  , 

. . . ,  u  and  V,  , 

r  1 

.  . .  ,  V  and 

r 

we 

are  looking 

for 

those 

records  (  Xj^ ,  . . . , 

X  )  with  u . 
r  1 

£ 

X.  <  v^  for 

1  < 

i  £ 

r.  (Note  that  we 

are  assuming  that 

the  variables 

in  the 

data 

base  have  been  numbered  in  a  certain  order  and  any  query  must 
be  couched  in  terms  of  the  first  r  of  these  variables.  This  is 
certainly  not  true  in  practice  but  it  simplified  our  notation 
and,  for  the  purposes  of  analysis  any  two  variables  are 
interchangabl e) . 

Within  the  data  base,  for  each  pair  of  values  u^.,  v^.  we 

have  M.  records  which  satisfy  condition  u.  <  X.  <  v. 

^  1—1—1 

and,  for  r  pairs  of  values  (u^,  v.),  i  =  1,  ...  r  we  have  M 

records  which  satisfy  condition  u.  <.  £  v^  for  i  =  1, 

. . . ,  r  . 

By  choosing  u =  v ^  we  have  the  exact  match  problem  and 
by  making  u.  <  v^  we  have  the  range  problem. 

Trees 

A  node  of  a  k-ary  tree  based  on  variable  X  (called  the  key) 
consists  of  k  values  of  x  together  with  k  pointers  to  other 
nodes. 

If  node  n^  contains  a  pointer  to  node  n^ 
then  a)  n^  is  called  a  parent  of  n^ 
b)  n^  is  called  a  child  of  n^^ 

Descendent  is  defined  in  the  obvious  manner  from  child. 
Any  node  which  has  no  children  is  called  a  leaf  node. 
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A  k-ary  tree  is  a  finite  collection  of  nodes  such  that: 

1)  there  is  exactly  one  node  (named  the  root)  which  is  the 


child  of  no  other  node. 

?)  Every  node  other  than  the  root  is  the  child  of  exactly 
one  node. 

3)  No  node  is  a  descendent  of  itself. 

An  ordered  k-ary  tree  is  a  k-ary  tree  in  which,  for  any 
node,  the  k  values  of  X  are  ordered  x^  £  x^  £  . . .  £  Xj^ 

and  for  non-leaf  nodes,  the  k  pointers  ....  P|^  are  such 

that  x^  £  x'  for  every  value  x'  in  P.  or  any  of  its 

descendents.  I.E.  x.  provides  a  lower  bound  for  any  value  in 
any  descendent. 

We  will  only  be  dealing  with  ordered  trees  and  will  assume 
the  ordering  without  specific  reference. 

A  k-ary  tree  provides  a  mechanism  for  searching  the  list  of 


values 

Xj ,  ...»  x^  to  locate  a 

particular 

value  y. 

algorithm  pror^is  as  follows 

0) 

Set  P  to  be  root  node 

1) 

Search  node  P  with  values  Xj, 

. . .  ,  X,  of 

k 

tree  to 

find  i  such  x.  £  y  < 

no  such  i 

set  i  =  k. 

?) 

Retrieve  node  p^. 

a)  If  node  is  not  leaf  then  set  P  »  p^-  and  repeat 
step  1. 

b)  If  node  is  leaf  then  if  y  is  in  x^,  ...,  x^  it 
will  be  in  node  p . . 

Figure  1  gives  an  example  of  a  k-ary  tree  and  a  retrieval  from 
a  k-ary  tree. 
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Since  we  are  concerned  with  I/O  requests,  with  appropriate 
choice  of  k ,  retrieving  a  node  requires  exactly  one  read. 

Searching  for  a  value  requires  traversing  the  tree  from  the 
root  to  a  particular  node  and  thus  the  worse  case  measure  of 

the  number  of  read  requests  is  the  length  of  the  longest  path 
from  the  root  to  a  leaf.  (This  is  the  height  of  the  tree.) 

The  height  of  an  ordered  k-ary  tree  is  minimized  if  the 

tree  is  kept  balanced  as  points  are  inserted  into  or  deleted 
from  the  tree. 

In  the  applications  we  will  make  of  trees  several  points 

can  be  made. 

1)  We  are  assuming  the  existence  of  a  data  base  and  the 

various  types  of  trees  will  provide  an  access  path  to 

the  data  base  based  on  the  values  of  variables.  It 

does  no  good  to  find  a  value  efficiently  using  a  tree 
structure  unless  we  can  subsequently  locate  the 

associated  data  record.  Thus,  we  will  assume  that  the 
values  in  the  leaf  nodes  have  associated  with  them  the 
appropriate  record  number. 

?)  Our  definitions  allow  the  values  at  the  higher  level 

nodes  to  either  appear  again  at  lower  level  nodes  or  to 
have  the  retrieval  algorithm  terminate  when  it 
successfully  finds  a  value  at  a  higher  level.  In  our 
applications  all  values  from  the  data  base  will  occur 
at  the  leaves.  An  implication  of  requiring  all  of  the 
data  values  to  occur  at  the  leaves  is  that  the  values 
at  the  non-leaf  nodes  need  not  be  values  from  the  data 
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base.  The  values  at  the  non-leaf  nodes  serve  only  to 
discriminate  between  the  values  .at  the  leaves.  A  type 
of  tree  where  only  the  first  portion  of  a  value  Is  used 
to  discriminate  is  called  a  radix  tree. 


B-tree 

The  first  access  method  we  shall  examine  is  the  B-tree  of 

Bayer  and  McCreight  (?).  In  this  section  we  present  the 

univariate  version  of  this  structure.  In  subsequent  sections 

we  present  two  different  applications  of  B-trees  to  solve  the 

multivariate  range  searching  problem. 

+ 

A  8  -tree  is  an  ordered  k-ary  tree  where  k  chosen  to  be 
the  maximum  number  of  items  that  can  be  read  with  one  read 
(kept  in  a  single  page  in  a  virtual  memory  environment).  In  a 
B*-tree  all  of  the  x.,  i  =  1,  k  appear  in  the  leaf  nodes, 
regardless  of  whether  they  also  appear  in  a  non-leaf  node. 

The  searching  algorithm  for  a  B  tree  we  have  already 
given.  We  now  give  the  algorithm  for  insertions  and  deletions 
and  then  discuss  these  algorithms. 

Insertion 

+ 

To  insert  a  new  value  y  into  an  existing  B  tree,  use  the 
following  algorithm. 

1)  Search  for  y  in  tree  and  locate  the  leaf  node  which 
would  contain  y  if  it  were  already  in  list. 

2)  Insert  y  into  node. 

3)  If  now  are  less  than  or  equal  to  k  values  in  node  then 
exit . 
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4)  There  are  now  k+1  entries  in  node.  Split  node  into  two 
nodes,  both  with  same  number  of  elements  (both  k/? 

k  1 

if  k  is  odd  or  one  with  (k  +  1)/?  and  one  with  if  k  is 
even).  The  two  new  nodes  are  such  that  if  x  is  in  node 
A  and  y  is  in  node  B  then  x  <  y. 

5)  The  new  node  must  now  be  reflected  in  the  parent  of  the 
split.  Retrieve  parent,  insert  smallest  value  of  node 
B  and  pointer  to  node  B  in  parent.  Modify  discriminant 
within  parent  for  A  (if  necessary). 

6)  Repeat  steps  3-5  for  new  node. 

Figure  ?  gives  an  example  of  the  splitting  process. 

Deletions 

+ 

To  delete  a  value  y  from  an  existing  B  tree  use 
following  algorithm: 

1)  Locate  value  y  in  tree  in  leaf  node  P. 

?)  Delete  y  from  node  P. 

k  k— 1 

3)  If  greater  than  or  equal  to  j  values  in  P  (-^  k  is 
odd)  or  P  is  a  root  then  exit. 

4)  There  are  now  k/?-l  entries  in  node.  Choose  sibling 
(Q)  of  node  with  same  parent.  If  node  Q  has  more  than 
k/?  entries  move  one  (either  smallest  or  largest)  entry 
from  Q  to  P.  Reflect  new  discriminant  values  in  parent 
of  Q  and  P  and  exit. 

If  node  Q  has  k/?  entries  then  merge  nodes  Q  and  P  into 
node  P  and  delete  node  Q. 
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5)  Retrieve  parent  of  Q  and  call  it  P.  Delete  reference 
to  Q  from  P 
Repeat  steps  3-5. 

It  should  be  obvious  from  the  insertion/deletion  algorithm  that 
no  node  (excluding  root)  will  ever  have  fewer  than  k/?  items  in 
i  t .  Thus  the  maximum  possible  height  of  the  tree  with  N  items 
is  lo9|(/p  Thus  a  retrieval  will  take  at  most  logj^^^  N 

reads.  It  should  also  be  obvious  from  the  insertion  algorithm 

that  at  most  one  split  can  occur  at  each  level  of  the  tree. 

Retrieval  and  splitting  are  the  only  I/O  operations  required  by 
insertions.  Thus  at  most  ?  operations  are 

required  for  an  insertion. 

Deletions  also  require  at  most  one  merger  per  level.  Thus, 
deletions  also  can  be  done  in  operations. 

The  type  of  B  tree  we  have  presented  maintains  all  of  the 

data  items  in  the  leaf.  Thus  to  solve  the  range  query  in  one 

dimension  it  is  only  necessary  to  search  for  the  lower  bound  of 
the  range  and  then  traverse  the  tree  until  the  upper  bound  is 


N  +  reads  where  M  is  the 

number  of  data  items  in  the  range. 

Note  that  the  solution  to  the  range  query  retrieves  the 
values  in  increasing  order  of  the  key. 


reached.  This  takes  at  most 
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Extendible  Hashing 

B-trees  operate  in  logarithmic  number  of  I/O  requests  and 
give  the  ability  to  retrieve  records  in  order  on  the  key  being 
searched.  If  it  is  not  desired  to  retrieve  records  in  key 
order  another  univariate  technique  is  available  with  a  better 
expected  retrieval  behavior  (although  not  necessarily  a  better 
insertion/deletion  behavior). 

This  technique  (extendible  hashing  (3))  is  a  combinatio  of 
radix  trees  and  hashing  -  a  well  established  technique  for 
randomized  but  repeatable  access  into  a  table.  We  assume  a 
general  familiarity  with  hashing.  A  general  introduction  to 
hashing  is  provided  by  Standish  (5). 

Hashing  into  a  fixed  size  table  (say  of  size  n)  consists  of 
two  parts. 

1)  A  randomizing  function  f  such  that  if  x  is  an  arbitrary 
key  value  and  y  <  n  the  probability  that  f(x)»y  is 
^/n.  (f  distributes  the  keys  uniformly  from  1  to  n). 

?)  If  x^y  and  f(x)«f(y)  and  x  is  already  in  the  table  then 
a  method  exists  which  will  find  a  free  cell  to  hold  y. 
This  is  called  collision  resolution. 

Two  problems  exist  with  the  standard  hashing  techniques. 
These  are 

1)  The  table  size,  n,  must  be  chosen  a  priori.  Hashing 


works  well  when  no  collisions  occur.  If  n  was  chosen 
too  small  for  the  particular  set  of  data  Inserted  into 
n  then  no  good  remedy  exists. 


2)  It  is  not  easily  possible  to  access  the  values  from  the 
table  in  a  specified  order.  Since  f  was  chosen  to  be  a 
randomizing  function,  it  cannot  simultaneously  maintain 
a  particular  order  of  the  keys.  This  is  only  a  problem 
if  retrieving  in  key  order  is  a  requirement  of  the 
particular  application. 

The  algorithm  we  now  present  removes  the  first  of  these 
problems  and  allows  the  table  size  to  grow  dynamically.  The 
basic  idea  behind  the  algorithm  is  to  build  a  radix  tree  using 
the  hashing  function  as  the  search  mechanism  for  the  tree. 

The  algorithm  assumes  the  existence  of  a  randomizing 

function  f  such  that  1  £  f{x)  £  where  n  is  chosen  so  that 

2^  is  the  largest  possible  table  size.  n=3?  is  a  typical 

type  of  value. 

At  any  point  in  time,  there  is  a  value  d  which  reflects 
essentially  the  table  size.  The  first  d  bits  of  f(x)  are  used 
as  the  radix  with  which  to  index  into  the  hash  table.  Thus  the 
root  of  the  radix  tree  is  2^  entries  long.  The  tree  has  only 
one  level,  aside  from  the  root. 

The  retrieval  algorithm  works  as  follows  for  a  key  x. 

1 )  Calculate  f (x) . 

2)  Retrieve  current  depth,  d,  of  the  root.  Use  the  first 
d  bits  of  f(x)  to  index  into  the  root  of  the  tree.  The 
value  retrieved  is  a  pointer  to  a  leaf  node  which 
contains  x  (if  it  is  in  the  table). 

3)  Hash  X  into  leaf  using  standard  hashing  and  collision 
resolution  techniques. 
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Figure  3  gives  an  example  of  this  type  of  tree  with  d=3. 

This  algorithm  takes  exactly  two  read  .requests  to  retrieve 
a  value.  The  first  request  reads  in  the  correct  portion  of  the 
root  (no  requirement  exists  that  the  entire  root  be  retrievable 
with  one  read).  The  second  read  retrieves  the  leaf  node  with 
the  desired  value.  The  correct  portion  of  the  leaf  can  be 
retrieved  directly  from  the  first  d  bits  of  f(x)  and  thus  no 

searching  need  be  done  to  find  the  pointer  to  the  leaf. 

Observe  in  Figure  3  that  several  locations  in  the  root 
point  to  the  same  leaf.  This  allows,  for  example,  the  doubling 
in  size  of  the  root  node  without  affecting  any  of  the  leaf 
nodes.  Thus  if  dj^d^d^  is  a  3  bit  binary  number  with 

pointer  P,  by  setting  d^d^d^O  and  d^d^d^l  to  both 

have  poiriv'-  P  we  have  increased  d  from  3  to  4,  doubled  the 

size  of  the  root  and  not  affected  the  leaf  nodes. 

The  insertion  algorithm  for  the  extendible  hashing 
structure  will  now  be  presented. 

To  insert  a  value  x  into  the  extendible  hashing  structure: 

1)  Locate  the  leaf  node  for  x  by  retrieval  algorithm. 

?)  If  node  is  not  full  insert  x  by  hashing  and  collision 
resolution  and  exit. 

3)  If  node  is  full  and  node  is  pointed  to  by  several 
places  in  root  (this  can  be  detected  efficiently)  then 
split  node  into  two  nodes  according  to  division  in 
parent  node.  Leave  d  unchanged. 


4)  If  node  is  full  and  is  not  pointed  to  by  several  places 
in  root  then  the  size  of  the  root  is  doubled,  by 
incrementing  d,  each  non-affected  pointer  in  root  is 
replicated  and  then  the  node  containing  X  is  split  into 
two  as  in  3). 

The  deletion  algorithm  is  similar  and  will  be  omitted. 

As  can  be  seen  from  the  insertion  algorithm  the  behavior  of 
the  extendible  hashing  algorithm  depends  heavily  upon  the 
uniformity  of  the  function  f.  In  the  worst  case  the  behavior 
of  the  algorithm  is  linear  in  n  but  both  analytic  and 
simulation  results  (3)  indicate  that  the  expected  behavior  of 
the  algorithm  is  somewhat  better  than  that  of  B-trees. 


Discussion 

Both  algorithms  discussed  provide  efficient  access  to  a  set 
of  keys.  B-trees  are  logarithmic  in  both  the  expected  and  the 
worst  cases.  Extendible  hashing  is  the  order  of  a  constant  in 
the  expected  case  for  retrievals  and  apparently  logarithmic  in 
insertions  (based  on  timing  charts  and  not  analytic  results). 

Both  provide  for  dynamic  modification  of  the  underlying 
data  base  and  respond  well  to  modifications. 


B-trees  require  at  most  ^  pages  of  disk  storage 

and  allow  for  retrieval  In  the  order  of  a  key  once  the  lower 
bound  has  been  found. 


Extendible  hashing  takes  ?  page  accesses  for  a  retrieval 

and  requires  at  least  "/k  +  log^n/k  pages  on  the  disk. 

Extendible  hashing  also  will  not  allow  retrieval  on  key  order 
but  only  in  f{key)  order. 

Multivariate  Retrieval 

We  now  turn  to  the  mere  general  case  of  finding  those 

records  such  that  if  (u.,  v.)  j*l,  ....  r  are  r  ranges  then 

J  J 

retrieve  all  records  with  u.  <  x.  £  v.  for  j  =  1,  ...,  r. 

J  w  J 

If  r=?  then  this  may  be  visualized  geometrically  by  viewing 
records  in  the  data  base  on  points  in  two  space  and 

(uj^.Vj^),  and  (u^,v^)  as  defining  a  rectangle  in  two 

space.  In  this  case  we  wish  to  retrieve  all  points  that  lie  in 
the  rectangle.  If  r=3  we  are  in  3  space  and  are  defining  a 
rectangular  solid  and  in  general  we  are  defining  a  region  in  r 
space. 

The  geometric  interpretation  will  become  useful  in  the 

second  algorithm  presented  which  solves  the  problem  in 
r-space.  The  initial  algorithm  we  present  will  solve  this 

problem  by  iterating  on  the  range  problem  for  each  of  the  keys 

and  thus  essentially  solves  the  problem  by  projecting  the 
rectangle  onto  each  of  the  coordinates  in  turn.  This  algorithm 
has  been  implemented  in  a  statistical  data  management  system 
available  on  mini  computers  (1). 
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Assume  all  the  records  of  the  data  base  are  numbered 

1,  ....  n  and  assume  that  given  1  £  i  <  n  we  can  retrieve  the 

record  easily.  The  algorithm  we  present  maintains  B-trees  for 

each  of  the  r  desired  keys.  Associated  with  each  key  is  the 

number  of  the  record  wich  contains  it. 

The  B-trees  are  maintained  permanently  on  the  disk.  When 
responding  to  a  particular  query  a  radix  tree  is  created.  This 
radix  tree  contains  the  current  set  of  records  that  satisfy  the 
query.  The  philosophy  behind  the  construction  of  the  radix 
tree  is  to  view  the  leaves  of  the  tree  as  being  N  consecutive 
bits.  If  record  i  satisfies  the  current  query  then  bit  i  will 
be  on,  otherwise  it  will  be  off. 

Viewing  this  bit  map  as  a  radix  tree  both  reduces  the 
memory  required  (under  reasonable  assumptions)  and  simplifies 
the  retrieval  from  the  tree.  Suppose  each  leaf  can  hold  k 
bits.  Then  to  locate  record  i  in  the  radix  tree  use  ’/k  as 
the  radix  and  interrogate  the  bit  numbered  i  mod  k  in  the 
appropriate  leaf. 

E.G.  If  k»10?4  and  we  wish  to  indicate  the  presence  of 
record  18360  then  turn  on  bit  95?  in  the  leaf  pointed  to  by  the 
17th  pointer  in  the  root. 

Using  a  radix  tree  rather  than  a  standard  bit  map 
introduces  one  additional  node  (the  root)  and  allows  the 
omission  of  any  leaf  not  referenced  by  a  particular  query. 
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If  ?56  pointers  can  be  kept  in  the  root  and  each  node 
contains  ?56  x  16  bits  then  10®  records  can  be  represented 
with  a  tree  of  height  two.  Once  the  individual  trees  have  been 
constructed  for  each  variable  then  they  can  be  merged  into  a 
tree  which  contains  the  desired  subset.  Figure  4  demonstrates 
this  process  for  one  variable. 

The  algorithm  for  retrieving  the  desired  subset  for  a 
multivariate  range  query  is 

1)  for  each  of  the  r  keys  (say  j)  construct  the  radix  tree 
that  reflects  those  records  with  Uj  £  £  Vj. 

?)  Merge  the  r  constructed  radix  trees  by  ANDing  the 
leaves  together. 

3)  The  resulting  radix  tree  contains  exactly  the  records 
desired. 

The  difficulty  of  constructing  the  radix  trees  for  a  single 
key  depends  upon  the  number  of  distinct  pages  referenced  by  the 
range  of  values  for  that  key,  if  the  first  d  bits  of  the  record 
number  are  the  same  (where  d  is  the  length  of  ’/k  in  bits) 
only  one  page  is  referenced,  etc.  If  Pj  pages  are  referenced 

by  the  j'"”  key  and  M.  are  the  number  of  distinct  data 

f  ^  th 

records  in  the  j  ^  range  then  the  construction  of  the  j 

prefix  tree  takes 


'“9k/? 


page  references  and  the  determination 


of  the  appropriate  subset  takes 

r  ?M . 

r  log^,^  n  +  z  +  Pj) 

J  A 
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Thus  this  algorithm  works  well  if  the  projection  on  each 
axis  contains  points  that  are  clustered  together  in  terms  of 
data  base  record  number.  It  also  works  well  if  sufficient  real 
memory  is  available  to  hold  the  constructed  prefix  trees  since 
then  no  page  faults  would  be  generated. 

Notice,  however,  that  if  r=l  and  u^^  =  v^^  and  the  keys 
are  unique  that  the  construction  of  the  prefix  tree  requires 
one  additional  page  reference  beyond  the  B-tree  retrieval.  If 
this  page  is  permanently  allocated  in  real  memory  then  for  the 
case  of  a  single  unique  identification  variable  this  method 
costs  no  additional  input/output 

Since  this  algorithm  depends  upon  univariate  B-trees,  if 
the  data  base  is  updated  the  individual  B-trees  respond  to  the 
changes  as  already  discussed. 

Also,  once  a  desired  subset  is  defined  we  can  retrieve  the 
records  in  the  subset  in  the  order  of  a  particular  key  by 
retrieving  from  the  B-tree  for  that  key  and  using  the  radix 
tree  to  determine  whether  each  record  was  in  the  desired  subset. 

Finally,  since  the  data  structures  permanently  maintained, 
the  B-trees,  are  univariate  the  only  dependence  upon  more  than 
one  variable  is  in  response  to  a  specific  request.  Thus,  the 
B-trees  that  are  maintained  are  suitable  for  univariate 
requests  on  any  key  or  multivariate  requests  on  any  combination 
of  keys. 


7-16 


K-D-B  Trees 


The  final  algorithm  and  data  structure  discussed  provides 
promise  of  a  more  efficient  access  for  requests  couched  in 

specified  a  priori  terms  of  set  of  keys.  This  structure  is  a 
generalization  of  B-trees  themselves  to  multiple  dimensions. 
For  simplicity  we  present  the  ?-d imens i ona 1  case  and  the  higher 
dimensional  structures  are  similar. 

One  dimensional  B-trees  can  be  viewed  geometrically  as 
providing  a  partitioning  of  an  axis  with  (roughly)  an  equal 
number  of  points  in  each  interval.  The  k  adjacent  partitions 
are  grouped  into  one  partition  at  the  next  higher  level  to 
provide  the  access  path. 

The  K-O-B  tree  (4)  is  a  generalization  of  this  geometric 

view  to  higher  dimensions.  In  the  ?-dimensiona1  case  instead 
of  partitioning  a  single  axis  into  intervals  as  in  one 
dimension,  we  partition  the  plane  into  rectangles.  At  the 

lowest  level  each  rectangle  has  (roughly)  the  same  number  of 
points.  At  higher  levels,  the  access  paths  are  provided  by 
grouping  rectangles  from  lower  levels  into  larger  rectangles. 
See  figure  5  for  a  graphical  representation  of  a  ?-D-B  tree. 

Thus,  a  ?-dimensional  range  query  defines  a  rectangle  in 
the  plane  and  all  rectangles  in  the  ?-d imens i onal  K-D-B  tree 

that  overlap  the  desired  region  would  be  searched  to  retrieve 
all  the  records  that  satisfy  the  query. 
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If,  in  fact,  all  of  the  lowest  level  rectangles  had  roughly 
the  same  number  of  points,  this  structure  would  guarantee  a 
logarithmic  worst  case  behavior.  The  problem  is  that  no 
insertion  and  deletion  strategies  currently  exist  which 
guarantee  a  minimum  number  of  points  in  a  rectangle. 

This  is  most  easily  seen  when  dealing  with  deletions, 
although  a  similar  problem  exists  with  insertions.  Recall  that 
the  deletion  algorithm  for  one  dimensional  B-trees  provided  for 
merging  two  adjacent  intervals  when  both  had  less  than  k/? 
points.  This  works  because  two  adjacent  intervals  also  define 
an  interval.  When  dealing  with  rectangles,  however,  this  does 
not  work.  If  A  and  B  are  two  adjacent  rectangles  then  one  edge 
of  A  must  be  a  portion  of  an  edge  of  B  (or  vice  versa).  If  the 
overlapping  edges  are  not  identical  then  the  merger  of  the  two 
rectangles  is  not  a  rectangle.  Thus,  when  deleting  points, 
either  the  definition  of  the  regions  in  terms  of  rectangles 
must  be  abandoned  or  the  merger  of  two  sparsely  filled 
rectangles  must  he  abandoned. 

Robinson  (4)  advocates  eliminating  the  merger  step  when 
deleting  which  in  the  worst  case  could  result  in  empty 
rectangles.  A  similar  problem  results  when  doing  insertions. 

The  mechanism  for  using  K-O-B  trees  then  is  to  build  the 
underlying  data  base  first.  Then  build  the  K-O-B  trees  and  use 
ad  hoc  techniques  to  allow  for  such  insertions  and  deletions  as 
may  occur.  Simulations  show  that  the  expected  behavior  in  such 
circumstances  is  very  good. 
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Comparison 

Figure  6  gives  a  table  which  compares  the  four  algorithms 
we  have  surveyed.  They  are,  compared  from  the  point  of  view  of 
time  to  retrieve,  time  to  update,  and  suitability  for  various 
types  of  queries. 

If  only  univariate  exact  match  queries  are  expected  then 
either  B-trees  or  extendible  hashing  provide  the  best 
responses.  The  choice  between  those  two  should  be  based  on  the 
dynamic  nature  of  the  data  base.  If  the  data  base  is 
relatively  static  (few  insertions  or  deletions)  then  extendible 
hashing  is  recommended.  If  the  data  base  is  highly  dynamic 
then  B-trees  are  recommended.  If  univariate  range  queries  are 
expected  as  well  as  exact  match,  then  B-trees  are  recommended. 

If  multivariate  exact  match  queries  are  expected  with 
little  a  priori  knowledge  of  which  variables  are  involved  then 
either  B-trees  or  extendible  hashing  may  be  used  to  create  the 
radix  bit  map.  Again  the  choice  is  based  on  the  dynamism  of 
the  data  base. 

If  multivariate  exact  match  queries  are  expected  for  a 
specific  set  of  variables  then  a  K-D-B  tree  could  be 
constructed  for  those  variables  if  the  data  bas’e  is  not  very 
dynamic . 

If  multivariate  range  queries  are  expected  then  the  choice 
is  between  K-O-B  trees  and  B-trees  with  radix  bit  mapping.  If 
the  underlying  data  base  Is  not  highly  dynamic  and  the  queries 
are  always  In  terms  of  the  same  variables  then  use  a  K-D-B  tree 
otherwise  use  the  univariate  B-trees  with  radix  bit  mapping. 
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Data  Base 
Record  K 


Values  in  root  node  are  implied  by  having  pointers. 


Values  in  leaf  nodes  are  maintained  by  turning  on  appropriate 
bit  in  node. 


F i gure  4 

Radix  bit  map  after  searching  Figure  ? 
for  23  <  X  £  6? 

(assume  8  enlries'’per  node) 


B-tree 


Extendible 

Hashing 

Radix  Bit 
Mapping 
Using 
B-tr ees 

K-O-B 
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Figure  6 

Comparison  of  Methods 


I/O 

Behav i or 
for 

Modification 


log 

(worst  case) 
1  og 

( expected) 
r  log 

(worst  case) 


1  og 

( i f  not  too 
dynamic ) 


7-25 


References 


1)  Bass,  L.  J.,  “OATMAN  FORTRAN  User's  Guide",  URI  Computer 
Science  Department,  TR80-1456. 

?)  Comer,  D.,  "The  Ubiquitous  B-tree",  Computing  Surveys, 

Vol.?,  No. 3,  September  1980. 

3)  Fagin,  R,  Nieverge,  H.  J.,  Pippenger,  N.  and  Strong,  H.  R., 
"Extendible  Hashing  -  A  Fast  Access  Method  for  Dynamic 
Files",  ACM  Transactions  on  Data  Base  System,  Vol.  4, 
No.  3,  September  1979. 

4;  Robinson,  J.,  "The  K-D-B  Tree:  A  Search  Structure  for 

Large  Mu  1 1 i-Oimens i onal  Dynamic  Indexes",  Proceedings 
1981  ACM-SIGMOO  Conference,  May  1981. 

5)  Standish,  T.  "Data  Structure  Techniques",  Add i son-Wes 1 ey 
1980. 


7-26 


ALGORITHMIC  COMPLEXITY 
Part  8 


by 

Ralph  E.  Bunker 
and 

Leonard  J.  Bass 


AN  EXPERIMLnIAL  EVALUATION  OF  THE  FRAME  MEMORY 
MODEL  OF  A  DATA  BASE  STRUCTURE 

ABSTRACT 

Frame  mefflory  is  an  analytic  model  of  a  data  base  access 
method.  This  model  enables  the  prediction  of  access 
performance  measures  In  terms  of  user  behavior  parameters. 
This  Is  an  Important  aspect  of  the  automatic  generation  of  data 
structures. 

In  this  study,  a  version  of  frame  memory  was  Implemented 
and  then  a  simulation  study  was  performed  to  validate  the 
predictions  of  the  analytic  model  against  the  Implementation. 

The  model  yielded  good  predictions  (less  than  lOX  error) 
for  most  of  the  cases  tested.  The  assumptions  under  which  the 
analytic  results  were  derived  were  violated- during  a  portion  of 
the  simulation  to  test  the  robustness  of  the  model  and  again, 
the  analytic  model  yielded  good  predictions. 
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AN  EXPERIMENTAL  EVALUATION  OF  THE  FRAME  MEMORY 
MODEL  OF  A  DATA  BASE  STRUCTURE 

A  desirable  goal  of  data  base  research  is  the  automatic 
generation  of  data  base  structures.  A  designer  would  specify 
some  limited  number  of  characteristics  of  the  data  and  would 
have  automatically  returned  the  data  structuresi  the  access 
items,  and  the  access  paths.  A  step  in  the  direction  of  that 
goal  would  be  for  the  designer  to  furnish  usage  information  and 
a  proposed  storage  structure,  and  to  have  returned  the  expected 
response  parameters.  The  frame  memory  model  of  storage 

structure  has  been  proposed  as  a  mechanism  for  predicting 
system  response  as  a  function  of  usage  and  structural 
information.  In  this  study,  we  report  on  an  experimental 
validation  effort  for  frame  memory,  \ 

t 

Most  attempts  at  automatic  design  involve  the  following  | 

i 

steps:  I 

1.  Determine  how  the  users  of  the  file  system  are  planning 
to  use  the  system.  This  provides  the  necessary  input 
for  the  automatic  design  system.  Usage  is  defined  by 
the  different  types  of  records  in  the  system,  their 
lengths  and  fields,  plus  the  expected  frequencies  of 
additions,  deletions,  modifications,  and  retrievals  to 
records  and  subsets  of  records  in  the  file. 
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I 


2.  Select  a  set  of  storage  structures  for  the  records 
based  on  usage  patterns  defined.  In  step  1. 

3.  Evaluate  how  this  set  of  storage  structures  perforn  In 
the  anticipated  envlroneent.  This  evaluation  Must  take 
Into  account  the  change  that  the  storage  structures 
will  undergo  due  to  maintenance. 

A.  Assign  a  rating  to  the  set  of  storage  structures  based 
on  this  evaluation.  This  rating  will  determine  whether 
or  not  the  set  of  structures  will  be  considered  further 
as  a  possible  design  choice. 

5.  Inform  the  designer  as  to  the  set  of  structures  which 
have  received  the  best  evaluations. 

Frame  Memory 

We  are  Interested  here  In  what  Is  involved  ^n  step  3  of  the 
design  process.  This  step  Is  complex  partially  because  the 
amount  of  time  needed  to  retrieve  data  from  a  storage  structure 
rarely  remains  constant  throughout  the  life  of  the  storage 
structure. 

March  (MAR78)  has  proposed  that  step  3  of  the  design 
process  be  divided  Into  two  steps  as  follows: 

3a.  Compute  the  average  time  to  perform  fundamental 
operations  on  the  storage  structure,  taking  into 
account  the  effects  of  updates  to  the  storage 
structure.  Fundamental  operations  Include  reading  a 
logical  block,  scanning  a  logical  block  of  records  for 
a  particular  record,  directly  accessing  a  record,  and 
writing  a  logical  block. 
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3b.  Use  information  from  step  3a  to  calculate  the  average 
time  to  perform  an  operation  of  interest>  which  may 
Involve  a  number  of  fundamental  operations.  For 
example,  the  operation  of  adding  a  record  to  a  data 
structure  can  Involve  first  the  operation  of  reading  in 
the  logical  block  which  will  contain  the  record  and 
then  writing  the  updated  logical  block. 

March  proposed  a  model  of  secondary  memory  which  he  called 
frame  memory.  He  also  analyzed  the  cost  of  using  this  model  to 
implement  retrievals  and  modifications  to  a  data  base.  The 
designer  would  specify  data  structure  and  retrieval 
requirements  in  terms  of  the  frame  memory.  The  cost  of 
satisfying  these  requirements  would  be  calculated  and  reported 
to  the  designer.  The  designer  could  then  choose  the  best  data 
structures. 

This  makes  sense  only  if  the  equations  used  to  predict  the 
performance  are  correct  and  there  is  an  implementation  of  frame 
memory  so  that  the  designer  can  then  use  this  implementation  to 
actually  access  the  data  structures  created. 

This  provides  the  motivation  for  the  study  reported  here. 
An  implementation  of  frame  memory  was  done  and  then  this 
implementation  was  tested  to  see  if  the  analysis  yielded 
correct  predictions.  Some  of  the  assumptions  within  which  the 
analysis  was  done  were  violated  to  test  the  dependency  of  the 
analysis  on  those  assumptions. 

The  results  indicate  that  the  predictions  were  close  to 
experimental  results  for  almost  all  cases. 
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From  a  user's  perspective,  frame  memory  partitions 
secondary  memory  into  contiguous  and  directly  accessible  areas 
of  storage  called  frames.  A  frame  is  the  logical  unit  of  data 
transferred  between  main  and  secondary  memory.  Information  is 
maintained  in  contiguous  areas  within  frames  called  frame 
records.  Figure  1  gives  a  user  perspective  of  a  frame  memory. 
Frames  have  four  essential  functional  characteristics: 

1.  Directly  accessible  records.  As  each  new,  possibly 
variable  length,  record  is  stored  in  a  frame  it  is 
assigned  a  local  Identifier  called  a  (frame  relative 
record)  token.  The  association  between  the  record  and 
this  token  is  unaffected  by  subsequent  frame  storage 
and  maintenance  operations. 

2;  Sequentially  accessible  records  •  Once  a  frame  is 
transferred  to  main  memory  its  records  may  be 
sequentially  referenced  in  either  their  physical  order 
or  In  a  user  constructed  logical  order  termed  the 
(frame  referencing)  stream  (Figure  2). 

3.  Frame  elasticity  -  A  frame  is  capable  of  stretching  to 
accommodate  arbitrary  growth.  This  is  the  way  in  which 
maintenance  operations  which  change  the  number  or  size 
of  records  are  handled  in  this  model.  Frame  growth  (or 
shrinkage)  has  no  direct  affect  on  other  frame 
functions  but  is  reflected'  Indirectly  in  frame 

performance  characteristics  (Figure  3). 
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4.  Record  stream  maneuverability  -  The  Inter-record 
structure  of  the  frame  reference  stream  is  maintained, 
by  the  frame  memory  and  may  be  dynamically  modified  by 
a  frame  user. 

Three  implementation  questions  arise: 

1.  What  typy  of  structure  will  support  frame  expansion 
(function  3)7 

2.  What  type  of  structure  «rill  be  used  to  maintain  the 
Internal  order  of  frame  records  within  a  frame 
(functions  2,4)7 

3.  What  types  of  structure  will  be  used  to  maintain 
tokens  for  records  within  a  frame  (function  1)7 

The  implementation  was  subject  to  the  constraint  that  it 
must  conform  to  the  basic  assumptions  that  March  used  in 
analyzing  his  model  of  frame  memory.  The  fundamental  data 
structure  that  was  used  to  support  the  frame  expansion  function 
was  a  chained  overflow  structure.  This  allows  the  user  to 
perceive  the  frame  as  expandable  while  the  frame  memory 
implementations  actually  decide  how  to  handle  the  expansion. 

A  fixed  amount  of  space  is  initially  allocated  in  secondary 
storage  for  each  frame.  This  initial  allocation  is  called  a 
prime  frame  and  the  area  of  secondary  storage  in  which  prime 
frames  are  allocated  is  called  the  primary  data  area.  The 
primary  data  area  spans  a  number  of  cylinders  of  a  disk. 
Within  each  of  these  cylinders  a  certain  percentage  of  space  is 
set  aside  for  the  primary  data  area.  The  remaining  space  is 
used  for  frame  expansion  and  is  called  the  local  overflow 
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area.  Records  are  stored  in  a  pzl*e  fraste  until  the  space 
allocated  to  the  prime  frame  is  exhausted  at  which  time  an 
additional  allocation  of  secondary  storage  space  called  a  frame 
extent  is  made.  There  are  two  types  of  frame  extents  -  local 
extents  and  global  extents.  A  local  extent  exists  in  the  same 
cylinder  as  the  prime  frame  it  is  assigned  to  and  is  allocated 
from  the  local  overflow  area.  A  global  extent  exists  in  a 
global  overflow  area  which  is  separate  from  the  primary  data 
area.  Figure  4  depicts  the  relation  betwen  these  frame 
components  and  cylinders.  Since  the  global  overflow  area  is 
separate  from  the  prime  data  area,  the  head  of  the  disk  must  be 
moved  and  hence  access  to  it  is  more  expensive.  A  local  extent 
will  always  be  used  if  there  is  space  available  within  the 
local  overflow  area. 

Frame  extent  allocations  (either  local  or  global)  can  be 
either  fixed  or  variable.  A  fixed  extent  is  generally  capable 
of  holding  several  records  whereas  a  variable  extent  is  only 
large  enough  to  store  the  record  which  caused  the  extent  to  be 
allocated. 

Once  an  extent  has  been  allocated  it  is  necessary  to 
associate  it  with  the  prime  frame  being  extended.  This  is  done 
by  maintaining  an  extent  index  in  the  prime  frame  which  points 
to  each  extent  associated  with  the  prime  frame.  Figure  4 
illustrates  this  method. 

The  expansion  structure  which  was  used  has  fixed  length 
extents  and  an  extent  index.  Although  an  extent  index  uses 
some  space  in  a  prime  frame,  the  amount  of  space  is  usually 
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small  compared  to  the  size  of  records  stored  in  the  frame  and 
an  index  has  the  advantage  of  locating  any  extent  in  only  one 
access.  Fixed  length  extents  may  waste  space  if  variable 
length  records  are  used.  They  do  reduce  the  number  of  extents 
needed,  however,  which  is  desirable  if  an  extent  fndex  is  used. 

Next,  the  maintenance  of  the  logical  frame  stream  will  be 
discussed.  This  is  basically  a  determination  of  how  the 
concept  of  "next  record"  will  be  implemented.  The  next  record 
is  the  one  which  would  be  physically  contiguous  to  the  current 
record  if  all  of  the  records  of  the  frame  were  contiguous.  We 
used  an  indexed  mechanism  for  maintaining  tokens.  That  is,  a 
pointer  is  maintained  for  each  frame  record;  a  token  is  a 
relative  pointer  count  from  the  beginning  of  the  index. 

In  review,  the  implementation  of  a  frame  memory  which  is 
used  in  this  research  has  the  following  characteristics: 

1.  Extents  are  fixed  length  and  maintained  by  an  index 
stored  on  the  prime  frame. 

2.  The  logical  frame  stream  is  maintained  by  address 
sequential  connections  (l.e.,  the  physical  order  of  the 
records  correspond  to  the  logical  order). 

3.  Frame  record  tokens  are  maintained  by  a  token  index 
stored  in  the  prime  frame. 

This  implementation  has  been  chosen  for  the  test  system  for 
two  reasons: 

1.  The  prediction  of  its  performance  measures  involves  a 
complex  analysis.  The  purpose  of  this  implementation 
is  to  verify  the  correctness  of  March's  analysis  of 
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frame  memory  performance,  hence,  it  is  appropriate  to 
choose  an  implementation  which  is  difficult  to  analyze. 

2.  Other  researchers  have  used  a  similar  Implementation 

for  experimental  relational  data  base  management 
systems  (ST076).  This  is  a  situation,  therefore,  in 

which  automatic  data  base  design  research  may  find  a 
valuable  application  in  the  future. 

March  analyzed  frame  memory  in  terms  of  two  types  of 

parameters:  usage  and  device.  The  usage  parameters  specify 

the  characteristics  of  how  the  frame  is  to  be  used.  March 

proposed  the  following  usage  parameters  for  his  frame  memory: 

NR  •  the  number  of  records  initially  loaded  into  the 
memory . 

LR  -  the  average  length  (characters)  of  a  stored 
record.  Records  may  be  of  variable  length. 

NAOO  -  the  number  of  additions  per  time  period. 

NOEL  -  the  number  of  deletions  per  time  period. 

RINT  - 

the  reorganization  interval.  This  is  the  time  period  at  the 
end  of  which  the  frame  memory  will  be  taken  off  line  and  all 
extents  will  be  Incorporated  into  prime  frames.  This  usually 
happens  when  performance  of  the  frame  memory  has  deteriorated 

significantly  due  to  update  generated  overflow  chains.  All 
performance  measures  are  averages  over  this  interval.  This 
measurement  standard  is  Inspired  by  the  idea  that  the  best  file 
organization  is  the  one  with  the  maximum  average  performance 
over  its  lifetime. 
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Some  of  his  device  parameters  are: 


TLOC  -  average  disk  latency  time. 

TRAN  -  the  average  disk  seek  time. 

TFRTE  -  the  data  transfer  rate  between  main  memory  and 
disk  storage. 

MLMB  -  the  sire  of  a  track  (bytes)  on  the  disk  used  to 
store  the  frame  memory. 

LSZE  -  the  amount  of  data  storable  in  a  cylinder. 

Other  usage  parameters  (those  connected  to  the  device 
usage) : 

PALF  -  the  primary  area  load  factor.  This  is  the 
percentage  of  each  cylinder  that  is  used  for 
prime  frames.  The  remaining  space  on  the 
cylinder  is  used  for  the  local  overflow  area. 

FMLF  -  the  frame  memory  load  factor;  the  proportion  of 
the  space  allocated  to  prime  frames  that  is 
required  to  hold  initially  loaded  data.  If  this 
is  less  than  one,  then  each  prime  frame  has  some 
free  space  to  use  before  allocating  an  extent. 

FRAE  -  the  length  of  a  frame  extent. 

Figure  5  Illustrates  the  mapping  of  an  extended  frame  into 
physical  storage  and  Illustrates  parameters  FMLF  and  FRAE. 

March  analyzed  ten  different  performance  measures.  We 


focus  on: 


1)  FFPHY  -  the  average  time  to  retrieve  a  full  frame  in 

physical  stream  order.  A  full  frame  consists  of 
a  prime  frame  and  whatever  extents  have  been 
generated  for  It. 

2)  FRTOK  -  the  average  time  to  retrieve  a  frame  record  by 

its  token. 

The  experiments  we  performed  were  intended  for  two 
purposes.  First  we  validated  March's  analysis  by  using  the 
assumptions  he  made  in  doing  the  analysis.  Secondly,  we 
violated  his  assumptions  to  explore  the  limitations  of  the 
analysis.  In  general,  the  results  of  the  experiments  showed 
the  robustness  of  his  equations  without  regard  for  whether  the 
assumptions  were  maintained. 

Six  fundamental  assumptions  regarding  the  characteristics 
and  use  of  the  frame  memory  were  incorporated  by  March  into  his 
equations.  These  assumptions  are: 

1.  March  assumes  only  .  the  most  primitive  type  of 
buffering.  Only  one  prime  frame  or  one  extent  can  be  in  main 
memory  at  a  time  and  the  current  contents  of  the  buffer  aren't 
checked  before  doing  I/O.  For  instance,  in  the  evaluation  of 
the  time  it  takes  to  read  a  frame  in  stream  order,,  it  is 
assumed  that  the  frame  has  to  be  fetched  from  the  disk.  In  a 
more  realistic  buffering  scheme,  there  is  the  possibility  that 
the  frame  is  already  in  core  and  hence  no  I/O  would  be 
necessary  to  fetch  it. 
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This  assumption  is  equivalent  to  the  one  that  no  frame  In 
the  memory  sustains  consecutive  additions  or  deletions  of 
records.  That  Is,  each  addition  or  deletion  operation  Involves 
a  frame  different  from  the  one  used  In  the  preceding 
operation.  We  followed  this  assumption  In  the  experiments. 

2.  The  amount  of  disk  storage  space  needed  for  extent  and 
token  Indices  Is  small  (  10  percent)  compared  to  the  storage 
used  for  frame  records.  March  Ignores  the  effect  of  overhead 

In  several  of  his  equations.  There  are  many  situations  In 
which  this  assumption  Is  questionable.  For  Instance,  If  frame 
records  are  only  ten  characters  long  then  overhead  may  consume 
twenty  to  twenty  five  percent  of  the  storage  used  for  the 
table:  hence,  the  percentage  of  overhead  Is  a  function  of  the 
record  length  of  a  frame  record  and  the  length  of  a  system 
pointer.  Both  of  these  are  Implementation  parameters.  The 
length  of  a  system  pointer  Is  Increased  (thereby  Increasing  the 
overhead  percentage)  In  the  experiment  which  tests  the  affect 
of  altering  this  assumption. 

3.  Maintenance  operations  (addition,  deletion,  and 
modification)  are  uniformly  distributed  over  the  set  of  prime 
frames.  This  Is  a  key  assumption  which  has  many  exceptions. 
For  Instance,  It  has  been  observed  that  In  many  data  base 
systems  twenty  percent  of  the  records  are  Involved  In  eighty 
percent  of  the  transactions  on  the  data  base.  We  tested  the 
effect  of  this  assumption  In  one  of  the  experiments. 
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4.  The  storage  device  containing  the  frame  memory  is 

dedicated  to  only  one  user.  In  other  words,  there  is  no 
concurrent  use  of  the  frame  memory  by  two  or  more  users  and  the 
frame  memory  storage  device  experiences  no  activity  other  than 
that  initiated  by  the  frame  memory  user.  It  is  expected  that 
future  models  of  a  frame  memory  will  allow  more  than  one  user 
since  concurrent  use  of  a  data  base  is  one  of  the  prime 
motivations  for  the  development  of  data  base  management 
systems.  A  multiuser  environment  is  approximated  by  randomly 
changing  the  position  of  the  disk  head  during  the  processing  of 
the  frame  memory  updates  and  retrievals.  There  are  three  ways 
in  which  the  head  can  move  corresponding  to  three  different 
usage  situations.  First,  it  has  not  moved  since  the  last  time 
that  a  particular  user  fetched  something  from  the  frame 
memory.  Secondly,  it  has  moved  but  has  stayed  within  the  frame 
memory  data  set.  This  happens  when  two  or  more  users  are 

concurrently  using  the  frame  memory.  Third,  it  has  moved 

outside  the  frame  memory  data  set.  This  movement  would  be 
caused  by  a  user  accessing  a  data  set  other  than  the  frame 

memory  data  set.  In  the  experiment  testing  the  Impact  of  many 
users  it  is  assumed  that  50  percent  of  the  time  the  head 
doesn't  move  and  50  percent  of  the  time  it  moves  within  the 
frame  memory  data  set.  The  frame  memory  is  assumed  to  occupy 
the  entire  disk  so  there  is  no  movement  outside  the  frame 
memory. 
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5.  The  rate  of  maintenance  and  retrieval  activity  Is 
linear.  In  other  words,  there  are  no  flurries  of  maintenance 
activity  followed  by  lulls  of  no  activity.  Also,  the  update 
operations  are  dispersed  uniformly.  That  is,  there  is  not  a 
batch  of  additions  followed  by  a  batch  of  deletions.  The 
experiments  that  were  performed  adhere  to  this  assumption. 

6.  The  degradation  of  the  frame  memory  is  a  function  of 
the  difference  between  the  number  of  additions  and  the  number 
of  deletions  and  does  not  depend  on  the  actual  number  of 
additions  or  deletions.  That  is,  two  hundred  additions  and  no 
deletions  cause  the  same  degradation  as  four  hundred  additions 
and  two  hundred  deletions.  This  assumption  is  implicit  in  the 
experiments  performed  since  in  all  experiments  only  additions 
are  made  to  the  frame  memory. 


Measurement  Techniques 

For  each  of  his  performance  measures,  March  calculates  a 
value  at  "steady  state".  He  defines  steady  state  as  the  time 
at  which  the  number  of  records  initially  loaded  into  the  frame 
has  been  doubled.  He  assumes  his  performance  measures  d.egzade 
linearly  from  an  initial  value  to  the  value  at  steady  state. 
He  calculates  the  performance  measures  at  steady  state  as  a 
function  of  physical  and  usage  parameters  such  as  average 
access  time  and  number  of  global  extents  generated. 

As  an  experiment  proceeded  the  physical  and  usage 
parameters  needed  by  the  predictive  equations  were  gathered  by 
the  implementation.  These  were  the  values  used  in  calculating 


the  predicted  values.  The  performance  measures  were  also 
measured  during  the  course  of  an  experiment.  This  provides 
measurement  versus  predicted  performance  from  the  start  of  an 
experiment  to  the  end  of  the  experiment. 

The  performance  measures  taken  during  the  course  of  the 
experiment  (average  time  to  read  all  records  In  token  order  and 
average  time  to  retrieve  a  token)  were  assumed  to  have  been 
modified  In  an  Interval  only  by  the  records  actually  added 
during  that  Interval.  Thus,  the  calculations  for  the 
performance  measures  were  done  with  each  addition  by  adding  an 
Incremented  value  (the  time  to  access  the  record  Just  added 
either  In  logical  order  or  alone)  to  a  running  total.  This 
enabled  the  calculation  of  average  values  without  actually 
having  to  read  all  of  the  records. 

Since  the  predicted  values  were  based  on  actual  physical 
and  usage  parameters,  any  dependency  upon  the  method  of 
estimating  these  values  were  eliminated  from  the  experiments. 

This  methodology  provided  a  means  for  testing  the 
prediction  equations  while  still  enabling  variations  In  some  of 
the  fundamental  assumptions  upon  which  the  predictions  were 
based. 

Experiments 

Six  experiments  were  performed.  Table  1  lists  the  values 
of  the  parameters  that  were  varied  for  each  experiment.  A 
brief  rationale  Is  now  given  for  the  choice  of  experiments. 
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TABLE  1  -  PARAMETERS  OF  EXPERIMENTS 


E# 

DIST 

SIZE 

EPF 

NAOO 

NOEL 

PALF 

USERS 

LP 

1 

U 

160 

5 

200 

0 

0.85 

one 

2 

2 

U 

80 

10 

200 

0 

1.00 

one 

2 

3 

U 

400 

2 

200 

0 

1.00 

one 

2 

4 

U 

160 

5 

200 

0 

0.85 

one 

.  2 

5 

U 

160 

5 

200 

0 

0.85 

one 

10 

6 

U 

160 

5 

200 

0 

0.85 

many 

2 

The  meaning  of  the  parameter  mnemonics  are: 

E#  -  experiment  number 

DIST  -  the  distribution  used  for  selecting  frames  for. updates. 
Here  U  means  uniform  distribution  and  N  normal 
distribution. 

SIZE  -  the  size  of  an  extent  allocation 
EPF  -  the  number  of  extent  sized  blocks  in  a  prime  frame 

NADO  -  the  number  of  additions  per  unit  time  interval 

NOEL  -  the  number  of  deletions  per  unit  time  interval 

PALF  -  percentage  of  a  cylinder  used  for  prime  frames 

USERS  -  the  number  of  users  competing  for  access  to  the  frame 
memory 

LP  -  the  length  of  a  system  pointer  (in  bytes) 
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Experiments  1-3  adhere  to  all  of  March's  assumptions  and 
are  a  test  for  fundamental  errors  in  his  equations.  Experiment 
1  is  run  with  a  frame  memory  containing  one  local  extent  for 
each  frame.  Extents  can  contain  two  records  and  at  load  time  a 
prime  frame  contains  nine  records.  Experiment  2  uses  a  frame 
memory  with  no  local  extents.  Each  extent  can  hold  only  one 
record  and  as  before  the  prime  frame  is  big  enough  to  hold  nine 
records.  Experiment  3  has  the  same  conditions  as  Experiment  2 
except  that  an  extent  can  contain  four  records. 

The  remaining  experiments  test  the  effect  of  altering  the 
assumptions  which  March  used  for  his  analysis. 

Experiment  4  uses  a  normal  distribution  to  determine  which 
frames  get  updates  (assumption  3).  The  frame  memory  used  has 
local  extents  (one  per  prime  frame)  and  each  extent  can  contain 
two  records. 

Experiment  5  uses  a  large  value  for  the  length  of  a  system 
pointer  in  order  to  make  the  overhead  needed  for  each  record 
approximately  10  percent  of  the  record  length  (assumption  2). 
The  frame  memory  used  has  local  extents  (one  per  prime  frame) 
and  each  extent  can  contain  two  records. 

Experiment  6  tests  the  effect  of  other  users  competing  for 
use  of  the  frame  memory  (assumption  4).  The  frame  memory  used 
has  no  local  extents  and  each  extent  can  contain  only  one 
record. 

The  following  parameters  were  held  constant  for  all  of  the 
experiments: 
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1) 

(LOGOVHO) 

The  track  size  of  the  logical  frame  memory 

device  was  3170  bytes. 

2) 

(LOGOVHD) 

No  overhead  is  needed  for  a  block  on  the 

logical  frame  device. 

3) 

(LOGTPC) 

There  are  12  tracks  per  logical  cylinder. 

4) 

(TCYLS) 

The  frame  memory  has  200  cylinders. 

5) 

(PHYMLMB) 

A  physical  track  can  contain  19254  bytes. 

6) 

(PHYOVHD) 

The  overhead  for  a  physical  block  is  135 

bytes. 

7) 

(FMLF) 

Each  frame  was  initially  completely  filled 

with  frame  records. 

8) 

(TLOC) 

The  disk  latency  time  for  the  logical  frame 

memory  device  was  36.3  milliseconds. 

9) 

(ACCFUNC) 

Figure  6  illustrates  the  seek  time  function 

used  for  the  logical  frame  memory  device. 

10) 

(TEST) 

The  frame  memory  operated  in  test  mode. 

11) 

(NUHFRHS) 

The  number  of  frames  is  1800. 

12) 

(NR) 

The  number  of  records  initially  loaded  is 

16200. 

13) 

(RINT) 

steady  state  is  defined  to  occur  after  81  unit 

time  intervals.  This  is  the  time  lyheo  the 
size  of  the  file  has  doubled.  This  is  the 
criteria  for  steady  state  that  was  used  by 
March. 

14.  (NUHTRKS)  The  physical  data  set  supporting  the  frame 
memory  uses  42  tracks. 
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Results 


The  results  of  the  experiments  are  summarized  in  Table  2 
and  graphically  depicted  in  Figures  7-8. 

TABLE  2  -  PERCENT-OF-ERROR  STATISTICS 


FFPHY 


EXP  . 

MEAN 

STD 

MAX 

MIN 

1 

12.83 

21.51 

20.76 

0.68 

2 

6.62 

13.37 

10.88 

0.15 

3 

24.13 

59.17 

35.39 

8.93 

4 

6.65 

2.61 

9.06 

0.44 

5 

5.78 

4.68 

7.45 

0.38 

6 

3.28 

2.35 

5.04 

0.05 

FRTOK 


EXP 

MEAN 

STD 

MAX 

MIN 

1 

4.44 

1.34 

5.34 

0.67 

2 

10.48 

0.64 

12.97 

9.55 

3 

6.04 

3.66 

7.50 

0.26 

4 

8.18 

3.93 

9.87 

0.62 

5 

7.59 

3.32 

8.52 

0.26 

6 

6.10 

0.55 

6.85 

2.93 
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A  brief  overview  of  the  results  will  be  presented  first 
followed  by  a  more  detailed  analysis.  A  predicted  value  within 
ten  percent  of  the  ob-served  one  is  considered  a  good 
prediction.  Of  course,  more  precise  predictions  are  desirable 
but  for  the  current  state  of  the  art  in  automatic  data  base 
design,  the  ten  percent  error  range  will  probably  be  accurate 
enough.  Each  performance  measure  will  be  discussed  separately. 

1.  Average  time  to  retrieve  a  full  frame  in  physical  order 
(FFPHY). 

Experiments  1,  4,  and  5  are  all  performed  on  equivalent 
frame  memories  (i.e.,  local  extents  are  available  ar d  two 
records  can  fit  on  an  extent).  Experiment  1  uses  all  of 
March's  assumptions  and  the  observed  value  of  FFPHY  is,  on  the 
average,  within  12.83  percent  of  the  predicted  value.  Changing 
the  assumptions  of  small  overhead  per  record  (experiment  4)  and 
a  uniform  distribution  of  updates  (experiment  5)  reduces  the 
average  error  by  about  50  percent  in  both  cases.  Experiments  2 
and  6  are  also  run  on  equivalent  frame  memories  (no  local 
extents  and  one  record  per  extent).  For  both  the  predicted 
value  is  well  within  10  percent  of  the  observed  one. 
Experiment  3  produced  a  large  discrepancy  between  the  predicted 
and  observed  values  (i.e.,  24.13  percent). 

2.  Average  time  to  retrieve  frame  by  token  (FRTOK) 

All  experiments  but  one  produced  observed  values  within  10 
percent  of  the  predicted  values  of  FRTOK.  The  exception  was 
experiment  2  for  which  the  average  percent  of  error  was  10.48 
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percent.  In  all  cases  the  observed  value  vas  higher  than  the 
predicted  one. 

In  his  analysis.  Hatch  first  coaputes  averages  for 
performance  degradation  at  what  ha  calls  : taady  state  time. 
This  is  merely  the  number  of  unit  time  intervals  required  to 
double  the  number  of  records  initially  loaded  in  the  frame 
memory.  The  degradation  at  steady  state  for  a  performance 
measure  is  denoted  by  OS(*measure) . 

OS(*FFPHY)  =  ANEXS  *  ATRES 

DS(*FRT0K)  s  FVTKS  *  ATRES 

where  ANEXS  is  the  average  number  of  extents  per  frame  at 
steady  state  time.  ATRES  is  the  average  time  to  read  an  extent 
at  steady  state.  FVTKS  is  the  probability,  at  steady  state 
time,  that  If  a  frame  has  extents  then  the  desired  record  is  in 
an  extent. 

Table  3  contains  the  analytical  and  observed  values  of 
ANEXS  and  FVTKS.  In  general,  there  is  a  good  agreement  between 
the  observed  and  predicted  values.  An  exception  is  experiment 
3  (large  system  pointer  overhead)  for  which  there  is  not  good 
agreement  for  either  .  Table  3  also  contains  a  breakdown  of 
the  average  time  to  read  an  extent  into  two  different  kinds  of 
averages.  The  first  average  (AVGE)  is  the  average  time  to  read 
an  extent  given  that  the  head  is  positioned  at  the  cylinder 
containing  the  extent  (or  prime  frame)  which  immediately 
precedes  it  in  the  chain  of  extents  attached  to  the  prime 


frame.  This  is  the  average  read  time  that  is  expected  when  a 
frame  is  read  or  scanned.  The  second  average  (AVGD)  is  the 
average  time  to  read  an  extent  giver  that  the  head  is 
positioned  at  the  cylinder  containing  the  prime  frame  to  which 
the  extent  belongs.  This  is  the  average  read  time  for  extents 
when  records  are  directly  accessed.  In  general,  AVGE  is  less 
than  AVGD  since  global  extents  may  occupy  the  same  cylinder  or 
cylinders  which  are  close  to  one  another.  Therefore,  it  takes 
less  time  to  fetch  an  extent  in  the  extent  chain  once  the  head 
is  positioned  in  the  global  extent  area  than  it  does  to  fetch 
the  same  extent  from  the  prime  frame.  March  does  not 
distinguish  between  AVGE  and  AVGD  but  calculates  one  average, 
ATRES,  which  is  a  function  of  the  time  to  access  a  local  extent 
and  the  time  to  access  a  global  extent.  He  assumes  that  the 
time  to  access  a  global  extent  is  TRAN  (also  listed  in  Table 
3).  For  these  experiments  TRAN  is  the  observed  value  for  all 
random  accesses  (including  prime  frames)  over  the  steady  state 
time  interval. 


TABLE  3 

PREDICTED  AND  OBSERVED  PERFORMANCE  MEASURE  VARIABLES 


OBS 

PRE 

PRE 

E# 

ANEXS 

ANEXS 

TRAN 

AVGE 

AVGD 

ATRES 

1 

4.80 

4.76 

86.7 

71.3 

88.5 

76.7 

2 

9.54 

9.03 

79.7 

67.3 

102.0 

80.3 

3 

2.20 

2.20 

94.8 

94.9 

103.0 

95.5 

4 

4.89 

4.76 

80.8 

67.7 

91.6 

72.1 

5 

5.58 

4.76 

86.3 

71.4 

91.7 

77.4 

6 

9.54 

9.03 

87.5 

81.8 

97.9 

88.2 

Column 

headers  are: 

(all 

averages  are 

state 

averages 

at  steady 

state) 

E#  •  experiment  number 

PRE  -  abbreviation  for  predicted 

OBS  -  abbreviation  for  observed 

ANEXS  -  the  average  number  of  extents  per  frame 

TRAN  -  the  average  time  to  do  a  random  access 

AVGE  >  the  average  time  to  read  an  extent  from  an  extent  chain 

AVGD  -  the  average  time  to  read  an  extent  from  a  prime  frame 

ATRES  -  overall  average  time  to  read  an  extent 

VTK  -  the  probability  that  a  record  is  in  an  extent 


In  most  cases  the  discrepancies  can  be  explained  by 
observing  the  behavior  of  the  variables  which  March  used  in  the 
calculation  of  the  performance  measure  under  discussion.  For 
this  purpose,  graphs  have  been  provided  which  map  the  behavior 
FVTK  (FVTKS  is  the  value  of  FVTK  at  steady  state)  (Figure  9) 
AVGE  and  AVGO  in  Figure  10  and  ANEX  (ANEXS  is  the  value  of  ANEX 
at  steady  state)  in  Figure  11. 

FFPHY  fared  so  poorly  in  experiment  3  primarily  because  of 
the  behavior  of  the  variable  ANEX  as  shown  in  Figure  11.  The 
rapid  acquisition  of  extents  at  the  beginning  of  the  experiment 
Introduced  much  more  degradation  than  at  that  stage  than 
predicted  by  March's  analysis.  In  our  experiments,  the  first 
update  to  a  frame  always  causes  an  extent  to  be  allocated. 
Hence,  at  the  beginning  of  experiment  3  the  chances  of  an 
extent  being  allocated  is  very  great.  Once  a  frame  gets  an 
extent,  however,  it  does  not  need  another  until  four  additions 
have  been  made  to  it  since  each  extent  can  contain  four 
records.  The  same  phenomenon  can  be  observed  in  the  ANEX  curve 
for  experiment  1.  It  is  not  as  pronounced  since  each  extent 
can  contain  only  two  records. 

The  variable  ATRES  also  affects  the  behavior  of  the 
performance  measure  FFPHY.  As  mentioned  earlier,  AVGE  is  the 
average  time  to  read  an  extent  when  extents  are  fetched 
sequentially  In  chains.  Therefore,  the  use  of  AVGE  in  the 
equation  for  FFPHY  is  more  accurate  than  the  use  of  ATRES.  The 
AVGE  curve  (Figure  10)  for  experiment  3  indicates  that  the 
value  of  AVGE  remains  fairly  constant  and  is  close  to  AVGD  (the 
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average  time  to  fetch  an  extent  from  a  prime  frame).  This  is 
because  extent  chains  tend  to  be  short  (since  each  extent  can 
hold  four  records)  and  hence  most  of  the  time  to  fetch  a  chain 
is  represented  in  the  fetch  of  the  first  extent  which  is  a 
direct  access  of  an  extent  (AVGD).  Under  these  circumstances 
the  theoretical  value  of  ATRES  will  be  close  to  AVGE  and  will 
be  a  good  estimation  of  it  (compare  AVGE  and  AVGO  in  Table  3). 
Since  ATRES  is  accurate  it  must  be  ANEXS  which  causes  the 
inaccuracy  of  the  predictions  of  FFPHY  in  experiment  3. 

The  AVGE  curve  for  experiment  1  is  more  interesting.  Here 
there  is  a  sharp  increase  in  the  average  time  to  read  an  extent 
at  the  beginning  of  the  experiment.  This  increase  is  due  to 
the  fact  that  local  extents  are  allocated  at  the  beginning  and 
these  can  be  accessed  quickly.  As  the  local  extent  areas 
become  full,  global  extents  are  allocated  and  the  average  time 
to  read  an  extent  Increases.  March's  ATRES  variable  does  not 
capture  this  behavior  and  tends  to  be  higher  than  AVGE  (much 
higher  at  the  beginning  of  the  experiment). 

The  AVGE  curve  for  experiment  2  shows  a  decrease  in  the 
average  time  to  read  an  extent  as  the  experiment  progresses. 
This  is  due  to  the  fact  that  no  local  extents  are  available  and 
as  the  extent  chains  grow  longer  (and  they  will  since  extents 
can  contain  only  one  record)  the  time  to  move  from  one  global 
extent  to  another  in  the  chain  begins  to  have  a  greater  affect 
on  the  average  time  to  read  an  extent.  In  spite  of  this 
decrease  in  the  value  of  AVGE,  the  value  of  FFPHY  is  predicted 
closely  in  experiment  2.  This  is  because  initially  ATRES  is 
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close  to  AVGE  but  becomes  a  poorer  estimate  as  time  passes. 
But  the  degradation  at  the  beginning  of  the  experiment  has  a 
greater  effect  on  an  Interval  average  than  degradation  that 
occurs  later.  Hence,  March's  analysis  predicts  the  early  state 
average  degradation  well  for  experiment  2  and  this  tends  to 
compensate  for  later  poorer  predictions  when  the  Interval 
average  is  calculated.  In  contrast,  for  experiment  1  the 
predicted  early  degradation  is  high  and  this  makes  all 
predicted  Interval  averages  high.. 

The  Av-GE  curves  for  experiments  4  and  5  exhibit  the  same 
behavior  as  the  one  for  experiment  1.  However,  the  average 
error  in  the  predicted  values  for  FFPHY  for  experiments  4  and  5 
is  lower  than  the  error  in  experiment  1.  This  can  be  explained 
as  follows.  For  experiments  4  and  5  the  local  extents  are 
dissipated  much  more  rapidly  than  in  experiment  1.  Experiment 
4  concentrates  most  updates  on  only  half  the  prime  frame 
cylinders,  filling  the  local  extent  areas  for  these  cylinders 
quickly.  In  experiment  5,  early  overhead  overflow  causes  the 
allocation  of  extra  extents.  Hence  the  period  of  low 
degradation  at  the  beginning  of  the  experiment  is  shorter  and 
doesn't  have  as  much  ot‘  an  effect  on  the  interval  average. 

For  experiments  1,  4,  and  5  the  behavior  of  the  AVGE 
vari8!;>le  is  most  responsible  for  the  inaccurate  predictions  of 
FFPHY.  This  is  particularly  evident  in  experiment  5  where  the 
addition  of  extents  to  handle  overhead  overflow  actually 
improve  the  prediction  of  FFPHY  (discussed  above).  Compare 
this  to  the  effect  that  uneven  allocation  had  in  experiment  3 
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where  ANEX  was  close  to  the  analytic  value  throughout  the 
experiment. 

The  following  insights  can  be  gleaned  from  the  preceding 
discussion: 

1.  if  AVGE  is  close  to  AVGO  then  March's  ATRES  will  be  a 
good  estimate  of  the  average  time  to  read  an  extent  and 
the  behavior  of  the  ANEX  variable  will  determine  the 
average  prediction  error.  This  occurs  when  many 
records  can  fit  in  an  extent; 

2.  if  AVGE  and  AVGD  differ  greatly  then  ATRES  won't  be  a 
good  estimate  of  the  average  time  to  read  an  extent  and 
the  effect  of  uneven  and  unexpected  extent  allocation 
will  be  reduced.  This  occurs  when  extents  can  contain 
only  a  few  records; 

3.  if  degradation  is  predicted  accurately  at  the  beginning 
of  the  experiment  then  the  average  percent  of  error 
will  be  less  than  if  early  degradation  is  poorly 
predicted. 

The  analysis  of  the  behavior  of  the  FRTOK  performance 
measure  is  much  simpler.  Most  of  the  discrepancy  between  the 
predicted  and  observed  values  appears  to  be  caused  by  the 
difference  between  AVGD  and  ATRES.  Experiment  2,  which 
predicted  FRTOK  the  worst  (10.46  percent  average  error),  was 
the  one  for  which  there  was  the  largest  percent  of  error 
between  AVGO  and  ATRES  at  steady  state  (see  Table  3). 
Experiment  4  also  had  a  large  error  between  AVGO  and  ATRES  but 
this  was  compensated  for  somewhat  by  the  fact  that  the  predicted 
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value  of  VTK  was  slightly  lower  than  the  observed  value. 

The  fact  that  AVGD  is  always  greater  than  ATRES  is 
compatible  with  the  fact  that  the  observed  value  of  FRTOK  is 
always  greater  than  the  predicted  value. 

VTK,  the  proportion  of  records  in  extents,  is  a  secondary 
cause  of  error  in  the  value  of  FRTOK.  in  the  absence  of  any 
overhead  Induced  extents,  it  is  equal  to  the  function  - 

f(nadd)  s  nadd/(l6200  *  nadd) 

where  nadd  is  the  number  of  additions  and  16200  is  the  number 
of  records  initially  loaded.  This  is  not  linear  but  is  nearly 
so.  Records  that  get  pushed  out  of  the  prime  frame  by 
expansion  will  raise  the  proportion  of  records  in  extents  but 
not  enought  to  seriously  effect  the  calculation  of  FRTOK.  For 
instance,  the  value  of  FRTOK  in  experiment  5  (overhead  Induced 
overflow)  has  a  7.59  percent  average  error  whereas  experiment  1 
(very  little  overhead  Induced  overflow)  has  a  4.44  percent 
average  error. 

Finally,  a  comment  is  made  on  the  ability  of  March's 
analysis  to  predict  the  performance  of  a  frame  memory  which  is 
used  concurrently  by  two  or  more  users.  The  frame  memory  used 
to  test  the  effect  of  more  than  one  user  had  no  local  extents 
since  the  advantage  of  local  extents  is  lost  when  the  head  may 
move  before  the  local  extent  is  accessed.  In  our  experiment, 
additional  users  did  not  affect  the  number  and  distribution  of 
extents,  they  only  affected  the  time  to  fetch  an  extent.  Since 
only  global  extents  were  used  in  the  experiment,  the  average 
time  to  access  an  extent  (AVGD  or  AVGE)  was  close  to  the  average 


8-27 


time  to  do  a  random  access  (see  Table  4.3).  Since  March  has 
TRAN  as  one  of  the  parameters  to  his  equations  and  assumes  the 
ATRES  s  TRAN  If  there  are  no  local  extents,  his  predictions 
were  not  adversely  affected  by  a  multiuser  environment  In 
experiment  6. 


Conclusion 


Given  the  assumptions  made,  March's  analysis  appears  to 
predict  the  performance  of  a  frame  memory  satisfactorily  (l.e., 
within  10  percent)  for  the  use  for  which  It  was  designed  -  as  a 
tool  to  aid  In  the  development  of  automatic  data  base  design 
systems.  Furthermore,  his  predictors  are  robust  since  we 
altered  several  of  the  assumptions  and  still  observed 
satisfactory  results  from  his  analysis.  This  robustness  Is 
achieved  mainly  by  the  judicious  choice  of  one  of  the 
parameters  of  the  analysis.  This  parameter,  TRAN,  Is  the 
average  time  to  do  a  random  access  In  the  frame  memory.  Unlike 
the  other  parameters  used,  TRAN  Is  far  from  obvious  and  its 
determination  Involves  an  Insight  into  the  performance 
characterlstcs  of  the  frame  memory  whose  performance  Is  being 
analyzed. 

The  frame  memories  which  failed  to  perform  as  predicted  by 
March's  analysis  were  those  which  represented  less  than  optimal 
designs.  One  of  them  used  small  extents  but  had  no  local 
extent  areas  thereby  forcing  any  addition  to  the  frame  memory 
to  be  stored  in  the  global  extent  area.  Another  one  used  large 
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extents  resulting  in  a  lot  of  unavailable  free  space  and  also 
had  no  local  extent  areas.  The  frame  memory  which  performed 

best  had  medium  sized  extents  and  a  local  extent  area. 

March's  anailysls  falters  in  the  following  situations: 

1.  If  large  extents  are  used,  non-linear  extent  allocation 
results,  leading  to  non-linear  degradation.  March 
assumes  linear  degradation. 

2.  When  extents  are  in  the  global  area,  March's  overall 

estimate  of  the  average  time  to  read  an  extent  is 
Incorrect.  This  occurs  since  during  a  frame  read  or 
frame  scan  many  extents  in  the  global  extent  area  are 
close  together  thereby  reducing  the  average  time  to 
fetch  the  next  extent. 

3.  The  expansion  of  indices  can  push  records  out  of  the 

prime  frame  into  extents.  This  phenomenon  produces  a 
large  number  of  unexpected  extents  if  the  overhead  per 
record  is  greater  than  10  percent.  However,  March's 
analysis  produced  satisfactory  results  when  the 

overhead  per  record  wad  approximately  10  percent. 

In  summary,  the  more  extents  a  frame  has  the  more 

inaccurate  is  March's  estimate  of  the  average  time  to  read  an 
extent.  The  number  of  extents  can  be  reduced  only  by  making 
extents  larger,  thereby  causing  a  non-linear  pattern  of  extent 
allocations.  This  has  been  shown  to  Invalidate  March's 
assumption  of  linear  degradation.  This  situation  is  not  as 
hopeless  as  it  might  sound.  A  reasonably  simple  set  of 
heuristics  could  be  developed  to  assure  that  the  decisions 


affecting  these  areas  result  In  good  data  base  designs  which 
will  be  predicted  accurately  by  March's  frame  memory  analysis. 

Directions  for  Further  Research 

March  has  demonstrated  the  usefulness  of  viewing  the  update 
Induced  change  of  performance  of  a  data  base  over  time  as  the 
sum  of  the  measure  of  performance  of  the  data  base  at  load  time 
and  a  time  related  measure  of  performance  degradation.  His 
model  of  frame  memory  fits  this  approach  very  well.  Prime 
frames  represent  Initial  or  non-degraded  performance  and 
extents  represent  degradation  of  performance.  His  assumption 
of  linear  degradation  was  shown  to  cause  problems.  This 
assumption  would  not  be  necessary  if  a  non-linear  degradation 
function  were  developed  for  each  performance  measure.  These 
functions  could  be  based  on  time  and  the  number  of  records  per 
extent.  Even  greater  precision  could  be  achieved  if  the 
effects  of  overhead  expansion  were  considered  and  the 
assumption  that  the  sire  of  records  was  large  compared  to  the 
overhead  they  require  could  be  dropped.  A  more  rigorous 
analysis  than  the  one  we  performed  would  be  necessary  for  the 
devlopment  of  the  non-linear  degradation  functions. 

March's  analysis  could  be  made  more  complete  if  it  didn't 
have  to  rely  so  heavily  on  the  average  time  to  do  a  random 
access.  Ideally,  what  would  be  supplied  as  a  parameter  is  the 
function  which  describes  the  time  to  move  the  disk  head  a  given 
number  of  cylinders.  The  average  time  to  do  a  random  access 
could  then  be  calculated  as  part  of  the  analysis. 


A  question  which  arises  Is  whether  more  sophisticated  data 
structures  (e.g.,  B-trees,  differential  files,  etc.)  are 
amenable  to  analysis  In  terms  of  Initial  performance  and  some 
time  related  degradation  of  performance.  If  so,  the  structures 
could  be  classified  according  to  their  degradation  functions. 

Finally,  an  Investigation  could  be  made  Into  transporting 
these  ideas  to  fields  other  than  data  base  management.  In 
particular,  the  field  of  software  quality  measurement  is 
interested  In  the  degradation  of  performance  (and  quality)  of 
programs  caused  by  code  changes.  The  work  presented  here  has 
offered  a  model  of  change  related  degradation.  Can  this  model 
be  adapted  so  that  Is  provides  a  useful  way  of  analyzing 
program  quality? 
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